[EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential cache reads"

Miquel Raynal miquel.raynal at bootlin.com
Thu Jun 1 00:37:34 PDT 2023


Hello,

eagle.alexander923 at gmail.com wrote on Wed, 31 May 2023 12:02:08 +0300:

> Hello.
> 
> I'm not very sure what I should have measured and how to catch the right moment.
> Here's what happened: The first shot was taken immediately after the
> board was launched,
> the second before the kernel crashes. Yellow beam - R/~B, blue - AD1.

Bean, can you be more specific about what timings you need to see?
Where do you think we might have a timing issue? Maybe we can just add
delays and see if we get different results. I did not understand where
the below link needs to be looked at specifically.

What bothers me though, is the absence of ECC error, just like if the
data was fine. Alexander, how is it possible that the NAND controller
does not complain about errors while squashfs does?

Can you also dump a buffer and compare with what you expect? Is the
data fully random? Smashed somewhere specific...?

Thanks,
Miquèl

> вт, 30 мая 2023 г. в 17:49, Bean Huo <beanhuo at micron.com>:
> >
> > Hi Miquel,
> >
> > Thanks for reaching out me, here has a TN-29-01: Increasing NAND Flash Performance.
> > https://media-www.micron.com/-/media/client/global/documents/products/technical-note/nand-flash/tn2901.pdf?rev=a228fd154c274ef78669b67ea097ecda
> >
> >
> > If Alexander can do tracing with oscilloscope on the parallel NAND bus, that would be help to check if this is the timing issue.
> >
> > Kind regards,
> > Bean
> >  
> > > -----Original Message-----
> > > From: Miquel Raynal <miquel.raynal at bootlin.com>
> > > Sent: Monday, May 29, 2023 1:46 PM
> > > To: Alexander Shiyan <eagle.alexander923 at gmail.com>
> > > Cc: JaimeLiao <jaimeliao.tw at gmail.com>; linux-mtd at lists.infradead.org; Bean Huo
> > > <beanhuo at micron.com>
> > > Subject: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential
> > > cache reads"
> > >
> > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you
> > > recognize the sender and were expecting this message.
> > >
> > >
> > > Hi Bean,
> > >
> > > I'm adding you to this thread because I'm clueless regarding what's happening to
> > > Alexander.
> > >
> > > Short recap: sequential page reads seem to fail with an MT29F Micron chip.
> > > Alexander is using a gpmc controller without on-die ECC, I doubt the error comes
> > > from the controller, so I would like to know if there is anything known to fail with
> > > these chips regarding the use of sequential reads. We can easily work around that
> > > situation if we identify the problem.
> > >
> > > Thanks a lot,
> > > Miquèl
> > >
> > > eagle.alexander923 at gmail.com wrote on Mon, 29 May 2023
> > > 13:33:04 +0300:
> > >  
> > > > пн, 29 мая 2023 г. в 11:12, Miquel Raynal <miquel.raynal at bootlin.com>:
> > > > ...
> > > > According to the MT29F2G08ABAEAWP datasheet, the chip supports  
> > > > > > the READ PAGE CACHE SEQUENTIAL opcode, but with two caveats:
> > > > > >  4. These commands supported only with ECC disabled.
> > > > > >  5. Issuing a READ PAGE CACHE series (31h, 00h-31h, 3Fh) command
> > > > > >   when the array is busy (RDY = 1, ARDY = 0) is supported if the previous
> > > > > >   command was a READ PAGE (00h-30h) or READ PAGE CACHE series
> > > > > >   command; otherwise, it is prohibited.
> > > > > >
> > > > > > As far as I understand, the second remark suits us, since we
> > > > > > create the correct sequence.  
> > > > >
> > > > > Exactly, we do:
> > > > >
> > > > >         READ0 (0), READSTART (30),
> > > > >         READCACHESEQ (31), data,
> > > > >         READCACHESEQ (31), data,
> > > > >         ...
> > > > >         READCACHEEND (3f), data.
> > > > >
> > > > > which is what the datasheet tells us I believe.
> > > > >  
> > > > > > But the first remark can be a problem in this case.  
> > > > >
> > > > > I was not aware of this limitation, it's only written in the
> > > > > summary, not in the details about the commands, nice finding. We
> > > > > need to prevent on-die ECC users from enabling this feature.
> > > > >
> > > > > But given the below trace, you're not using the on-die ECC engine,
> > > > > right? It looks like you're using the controller's ELM engine to
> > > > > perform ECC correction, so I don't see why this specific limitation
> > > > > would hit us. Can you confirm the ECC engine of the chip is disabled?  
> > > >
> > > > Yes, on-die ECC is disabled.
> > > > Please advise where I can insert some debug messages to clear things up.
> > > >  
> > > > > > > > omap-gpmc 50000000.gpmc: GPMC revision 6.0 ...
> > > > > > > > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
> > > > > > > > nand: Micron MT29F2G08ABAEAWP
> > > > > > > > nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB
> > > > > > > > size: 64
> > > > > > > > nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme ...
> > > > > > > > VFS: Mounted root (squashfs filesystem) readonly on device 254:0.
> > > > > > > > devtmpfs: mounted
> > > > > > > > Freeing unused kernel image (initmem) memory: 1024K Run
> > > > > > > > /sbin/init as init process SQUASHFS error: lzo decompression
> > > > > > > > failed, data probably corrupt SQUASHFS error: Failed to read
> > > > > > > > block 0xd291c2: -5 SQUASHFS error: lzo decompression failed,
> > > > > > > > data probably corrupt SQUASHFS error: Failed to read block
> > > > > > > > 0xd291c2: -5 SQUASHFS error: Unable to read data cache entry
> > > > > > > > [d291c2] SQUASHFS error: Unable to read page, block d291c2,
> > > > > > > > size 14307 SQUASHFS error: Unable to read data cache entry
> > > > > > > > [d291c2] SQUASHFS error: Unable to read page, block d291c2,
> > > > > > > > size 14307 Kernel panic - not syncing: Attempted to kill init!
> > > > > > > > exitcode=0x00000007 CPU: 0 PID: 1 Comm: init Not tainted
> > > > > > > > 6.3.0+ #105 Hardware name: Generic AM33XX (Flattened Device
> > > > > > > > Tree)  unwind_backtrace from show_stack+0xb/0xc  show_stack
> > > > > > > > from dump_stack_lvl+0x2b/0x34  dump_stack_lvl from
> > > > > > > > panic+0xbd/0x230  panic from make_task_dead+0x1/0x120
> > > > > > > > make_task_dead from 0xc102ca80 ---[ end Kernel panic - not
> > > > > > > > syncing: Attempted to kill init!
> > > > > > > > exitcode=0x00000007 ]---  


Thanks,
Miquèl



More information about the linux-mtd mailing list