gpmc-nand broken since v4.12

Boris Brezillon boris.brezillon at free-electrons.com
Mon Oct 16 06:50:24 PDT 2017


On Mon, 16 Oct 2017 14:39:04 +0200
Boris Brezillon <boris.brezillon at free-electrons.com> wrote:

> On Mon, 16 Oct 2017 15:12:38 +0300
> Roger Quadros <rogerq at ti.com> wrote:
> 
> > Hi Boris,
> > 
> > 
> > Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> > 
> > On 16/10/17 14:34, Boris Brezillon wrote:  
> > > Hi Roger,
> > > 
> > > On Mon, 16 Oct 2017 13:22:04 +0300
> > > Roger Quadros <rogerq at ti.com> wrote:
> > >     
> > >> 
> > >> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> > >>
> > >> On 13/10/17 23:29, Boris Brezillon wrote:    
> > >>> On Fri, 13 Oct 2017 22:24:58 +0200
> > >>> Boris Brezillon <boris.brezillon at free-electrons.com> wrote:
> > >>>       
> > >>>> On Fri, 13 Oct 2017 14:50:33 +0200
> > >>>> Boris Brezillon <boris.brezillon at free-electrons.com> wrote:
> > >>>>      
> > >>>>> On Fri, 13 Oct 2017 14:57:29 +0300
> > >>>>> Roger Quadros <rogerq at ti.com> wrote:
> > >>>>>         
> > >>>>>> Hi Boris,
> > >>>>>>
> > >>>>>> NAND on gpmc-omap breaks for me while doing a unmount of a ubi volume since v4.12
> > >>>>>>
> > >>>>>> Behaviour follows through in v4.13 and v4.14-rc as well.
> > >>>>>>
> > >>>>>> Do you have any idea about this?        
> > >>>>
> > >>>> More info on what has changed in 4.12: we no longer allocate the
> > >>>> nand_buffer struct + its internal buffer using a single big kmalloc
> > >>>> call, which means the nand_buffer struct is now likely to be allocated
> > >>>> in a small object slab (sizeof(nand_buffers) = 12). If you have a
> > >>>> use-after-free bug somewhere in the kernel, it might corrupt the      
> > >>
> > >> Spot on. Problem starts from commit 4d5dea5725f3aa95b9b64e252540e435dd216dbc
> > >> "mtd: nand: allocate aligned buffers if NAND_OWN_BUFFERS is unset"
> > >>    
> > >>>
> > >>> I meant buffer-overflow, not use-after-free.      
> > >>
> > >> Your analysis seems correct.
> > >>    
> > >>>       
> > >>>> nand_buffers object, which might explain the bug you see here.
> > >>>>      
> > >>>>>
> > >>>>> Can you try with this patch [1] applied and paste me the values printed
> > >>>>> just before the crash?
> > >>>>>
> > >>>>> [1]http://code.bulix.org/lc8xk0-209746      
> > >>
> > >> == unmounting volume
> > >> [   36.308792] nand: nand_write_subpage_hwecc:2575 ecc_calc =   (null) oob_poi = ed096800
> > >> [   36.317319] mtd_ooblayout_set_bytes:1330 dst = ed096802 src =   (null)
> > >> [   36.324162] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > >>
> > >>
> > >> Running the following patch
> > >> https://hastebin.com/ulogurutuz.php
> > >> shows
> > >>
> > >> [   37.260689] nand: nand_write_subpage_hwecc:2547:A ecc_calc = ed116e40 oob_poi = ed117800
> > >> [   37.260772] nand: nand_write_subpage_hwecc:2556:A1:step 0, ecc_calc = ed116e40
> > >> [   37.260779] nand: nand_write_subpage_hwecc:2562:A2:step 0, ecc_calc = ed116e40
> > >> [   37.260834] nand: nand_write_subpage_hwecc:2556:A1:step 1, ecc_calc = ed116e40
> > >> [   37.260840] nand: nand_write_subpage_hwecc:2567:A3-pre:step 1, ecc_calc = ed116e40
> > >> [   37.260846] omap_calculate_ecc_bch
> > >> [   37.260856] nand: nand_write_subpage_hwecc:2570:A3-post:step 1, ecc_calc =   (null)
> > >> [   37.260912] nand: nand_write_subpage_hwecc:2556:A1:step 2, ecc_calc =   (null)
> > >> [   37.260918] nand: nand_write_subpage_hwecc:2562:A2:step 2, ecc_calc =   (null)
> > >> [   37.260972] nand: nand_write_subpage_hwecc:2556:A1:step 3, ecc_calc =   (null)
> > >> [   37.260978] nand: nand_write_subpage_hwecc:2562:A2:step 3, ecc_calc =   (null)
> > >> [   37.260984] nand: nand_write_subpage_hwecc:2587:B ecc_calc =   (null) oob_poi = ed117800
> > >> [   37.260991] mtd_ooblayout_set_bytes:1330 dst = ed117802 src =   (null)
> > >>
> > >> which means omap_calculate_ecc_bch() it the culprit.
> > >>
> > >> Looks like the function calculates and stores ECC for all sectors even if we just want ECC
> > >> for just one sector (sub-page).    
> > > 
> > > Yes, looks like you find the root cause.
> > >     
> > >>
> > >> Is my understanding correct
> > >> - We should not be hooking the multi-sector ECC calculator omap_calculate_ecc_bch() to ecc.calculate
> > >> - provide a new one sector ECC calculator function (for BCH4/8/16) for omap and hook it to ecc.calculate
> > >> OR
> > >> - override nand_read_subpage() and nand_write_subpage() using omap specific implementation (for BCH4/8/16).    
> > > 
> > > Second solution sounds simpler because the ECC sector information is
> > > not passed to ecc->calculate() hook, which means you'd have to extract
> > > it from the ecc_calc pointer:
> > > 
> > > 	(uintptr_t)(ecc_calc - chip->buffers->ecccalc) / chip->ecc.size    
> > 
> > I don't think we need ECC sector number at all if we're always going to do a transfer
> > of one sector of data after calling ecc.hwctl(NAND_ECC_READ/WRITE) and before calling ecc.calculate.
> > 
> > My understanding is that the ECC calculators sector 0 registers will always contain
> > the right content irrespective of which sector we transfer as long as we do,
> > 	ecc.hwctl(mtd, NAND_ECC_READ/WRITE);
> > 	transfer one sector;
> > 	ecc.calculate();  
> 
> Ok, then the first solution sounds good too.
> 
> > 
> > Why isn't there a nand_read_subpage_hwecc()?  
> 
> Probably because nobody ever needed it. Feel free to add it if you
> think this is necessary.

Actually, if you want to make your patch easily backport-able to
stable releases, you'd better go for the omap_nand_write/read_subpage()
approach. The generic solution can be done afterwards.

> 
> > In the current form a subpage read won't work if
> > if ecc->mode is NAND_ECC_HW and controllers expect a ecc.hwctl(NAND_ECC_READ) before
> > calling ecc.calculate().
> > 
> > For the OMAP case I can override both subpage functions.
> > Is there a good way to test if subpage read/writes are working as they should?  
> 
> There's a nandsubpage test [1] in mtd-utils.
> 
> [1]http://git.infradead.org/mtd-utils.git/blob/HEAD:/tests/mtd-tests/nandpagetest.c
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/




More information about the linux-mtd mailing list