[EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential cache reads"

Bean Huo beanhuo at micron.com
Thu Jun 1 09:55:42 PDT 2023


Hi Miquel,

As you mentioned no ECC error.  And SquashFS complains: Unable to read data cache.
We want to see I/Ox, RE#, WE# and R/B#, to check if command input and data output properly.
It is better to capture the command 31h, and its following data.  

Kind regards,
Bean

> -----Original Message-----
> From: Miquel Raynal <miquel.raynal at bootlin.com>
> Sent: Thursday, June 1, 2023 9:38 AM
> To: Alexander Shiyan <eagle.alexander923 at gmail.com>
> Cc: Bean Huo <beanhuo at micron.com>; JaimeLiao <jaimeliao.tw at gmail.com>;
> linux-mtd at lists.infradead.org
> Subject: Re: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential
> cache reads"
> 
> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you
> recognize the sender and were expecting this message.
> 
> 
> Hello,
> 
> eagle.alexander923 at gmail.com wrote on Wed, 31 May 2023 12:02:08 +0300:
> 
> > Hello.
> >
> > I'm not very sure what I should have measured and how to catch the right moment.
> > Here's what happened: The first shot was taken immediately after the
> > board was launched, the second before the kernel crashes. Yellow beam
> > - R/~B, blue - AD1.
> 
> Bean, can you be more specific about what timings you need to see?
> Where do you think we might have a timing issue? Maybe we can just add delays and
> see if we get different results. I did not understand where the below link needs to be
> looked at specifically.
> 
> What bothers me though, is the absence of ECC error, just like if the data was fine.
> Alexander, how is it possible that the NAND controller does not complain about
> errors while squashfs does?
> 
> Can you also dump a buffer and compare with what you expect? Is the data fully
> random? Smashed somewhere specific...?
> 
> Thanks,
> Miquèl
> 
> > вт, 30 мая 2023 г. в 17:49, Bean Huo <beanhuo at micron.com>:
> > >
> > > Hi Miquel,
> > >
> > > Thanks for reaching out me, here has a TN-29-01: Increasing NAND Flash
> Performance.
> > > https://media-www.micron.com/-/media/client/global/documents/product
> > > s/technical-note/nand-flash/tn2901.pdf?rev=a228fd154c274ef78669b67ea
> > > 097ecda
> > >
> > >
> > > If Alexander can do tracing with oscilloscope on the parallel NAND bus, that
> would be help to check if this is the timing issue.
> > >
> > > Kind regards,
> > > Bean
> > >
> > > > -----Original Message-----
> > > > From: Miquel Raynal <miquel.raynal at bootlin.com>
> > > > Sent: Monday, May 29, 2023 1:46 PM
> > > > To: Alexander Shiyan <eagle.alexander923 at gmail.com>
> > > > Cc: JaimeLiao <jaimeliao.tw at gmail.com>;
> > > > linux-mtd at lists.infradead.org; Bean Huo <beanhuo at micron.com>
> > > > Subject: [EXT] Re: Boot failed after patch "mtd: rawnand: Support
> > > > for sequential cache reads"
> > > >
> > > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments
> > > > unless you recognize the sender and were expecting this message.
> > > >
> > > >
> > > > Hi Bean,
> > > >
> > > > I'm adding you to this thread because I'm clueless regarding
> > > > what's happening to Alexander.
> > > >
> > > > Short recap: sequential page reads seem to fail with an MT29F Micron chip.
> > > > Alexander is using a gpmc controller without on-die ECC, I doubt
> > > > the error comes from the controller, so I would like to know if
> > > > there is anything known to fail with these chips regarding the use
> > > > of sequential reads. We can easily work around that situation if we identify the
> problem.
> > > >
> > > > Thanks a lot,
> > > > Miquèl
> > > >
> > > > eagle.alexander923 at gmail.com wrote on Mon, 29 May 2023
> > > > 13:33:04 +0300:
> > > >
> > > > > пн, 29 мая 2023 г. в 11:12, Miquel Raynal <miquel.raynal at bootlin.com>:
> > > > > ...
> > > > > According to the MT29F2G08ABAEAWP datasheet, the chip supports
> > > > > > > the READ PAGE CACHE SEQUENTIAL opcode, but with two caveats:
> > > > > > >  4. These commands supported only with ECC disabled.
> > > > > > >  5. Issuing a READ PAGE CACHE series (31h, 00h-31h, 3Fh) command
> > > > > > >   when the array is busy (RDY = 1, ARDY = 0) is supported if the previous
> > > > > > >   command was a READ PAGE (00h-30h) or READ PAGE CACHE series
> > > > > > >   command; otherwise, it is prohibited.
> > > > > > >
> > > > > > > As far as I understand, the second remark suits us, since we
> > > > > > > create the correct sequence.
> > > > > >
> > > > > > Exactly, we do:
> > > > > >
> > > > > >         READ0 (0), READSTART (30),
> > > > > >         READCACHESEQ (31), data,
> > > > > >         READCACHESEQ (31), data,
> > > > > >         ...
> > > > > >         READCACHEEND (3f), data.
> > > > > >
> > > > > > which is what the datasheet tells us I believe.
> > > > > >
> > > > > > > But the first remark can be a problem in this case.
> > > > > >
> > > > > > I was not aware of this limitation, it's only written in the
> > > > > > summary, not in the details about the commands, nice finding.
> > > > > > We need to prevent on-die ECC users from enabling this feature.
> > > > > >
> > > > > > But given the below trace, you're not using the on-die ECC
> > > > > > engine, right? It looks like you're using the controller's ELM
> > > > > > engine to perform ECC correction, so I don't see why this
> > > > > > specific limitation would hit us. Can you confirm the ECC engine of the chip
> is disabled?
> > > > >
> > > > > Yes, on-die ECC is disabled.
> > > > > Please advise where I can insert some debug messages to clear things up.
> > > > >
> > > > > > > > > omap-gpmc 50000000.gpmc: GPMC revision 6.0 ...
> > > > > > > > > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
> > > > > > > > > nand: Micron MT29F2G08ABAEAWP
> > > > > > > > > nand: 256 MiB, SLC, erase size: 128 KiB, page size:
> > > > > > > > > 2048, OOB
> > > > > > > > > size: 64
> > > > > > > > > nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme ...
> > > > > > > > > VFS: Mounted root (squashfs filesystem) readonly on device 254:0.
> > > > > > > > > devtmpfs: mounted
> > > > > > > > > Freeing unused kernel image (initmem) memory: 1024K Run
> > > > > > > > > /sbin/init as init process SQUASHFS error: lzo
> > > > > > > > > decompression failed, data probably corrupt SQUASHFS
> > > > > > > > > error: Failed to read block 0xd291c2: -5 SQUASHFS error:
> > > > > > > > > lzo decompression failed, data probably corrupt SQUASHFS
> > > > > > > > > error: Failed to read block
> > > > > > > > > 0xd291c2: -5 SQUASHFS error: Unable to read data cache
> > > > > > > > > entry [d291c2] SQUASHFS error: Unable to read page,
> > > > > > > > > block d291c2, size 14307 SQUASHFS error: Unable to read
> > > > > > > > > data cache entry [d291c2] SQUASHFS error: Unable to read
> > > > > > > > > page, block d291c2, size 14307 Kernel panic - not syncing: Attempted
> to kill init!
> > > > > > > > > exitcode=0x00000007 CPU: 0 PID: 1 Comm: init Not tainted
> > > > > > > > > 6.3.0+ #105 Hardware name: Generic AM33XX (Flattened
> > > > > > > > > Device
> > > > > > > > > Tree)  unwind_backtrace from show_stack+0xb/0xc
> > > > > > > > > show_stack from dump_stack_lvl+0x2b/0x34  dump_stack_lvl
> > > > > > > > > from
> > > > > > > > > panic+0xbd/0x230  panic from make_task_dead+0x1/0x120
> > > > > > > > > make_task_dead from 0xc102ca80 ---[ end Kernel panic -
> > > > > > > > > not
> > > > > > > > > syncing: Attempted to kill init!
> > > > > > > > > exitcode=0x00000007 ]---
> 
> 
> Thanks,
> Miquèl


More information about the linux-mtd mailing list