Boot failed after patch "mtd: rawnand: Support for sequential cache reads"
Alexander Shiyan
eagle.alexander923 at gmail.com
Sun May 28 23:10:32 PDT 2023
Hello Miquel.
пт, 26 мая 2023 г. в 21:14, Miquel Raynal <miquel.raynal at bootlin.com>:
> Hi Alexander,
> eagle.alexander923 at gmail.com wrote on Thu, 25 May 2023 10:48:39 +0300:
> > Hello.
> > Kernel boot fails after patch "mtd: rawnand: Support for sequential
> > cache reads" (thanks to git bisect).
> > Please advise what can be done here and where to look for a bug.
> Thanks for the report, and sorry for the trouble. Right now I don't
> know what's wrong with the driver but as a first step, you could just
> try to reset chip->controller->supported_op.cont_read after
> rawnand_check_cont_read_support(). It should just avoid using the
> optimization and solve the boot. That's of course a very early fix, we
> now need to understand further what's going on.
When I comment out the line "rawnand_check_cont_read_support(chip);"
the booting works as expected.
> My first guess would be that the sequential read patterns are not
> supported by the controller or badly implemented by its driver. But
> that is strange given the simplicity of this controller. This
> controller is meant to be versatile, I doubt it does not support these
> operations. Plus, I would expect page accesses to be directly
> implemented by the driver and not be affected by this logic. Could you
> try to trace the actual calls which are made through the mtd layer
> which lead to these errors? Is ->exec_op() involved in the process?
> Where? How?
Yes, Here everything goes as expected, debugging shows that the correct
opcodes are passing, for the NAND_CMD_READCACHESEQ it is 0x31.
> Also, what kernel are you using exactly? I'm surprised there is no
> mtd-related error. If you reboot with an older kernel, you get your
> data, right?
Right. This bug appeared in Linux 6.3. For 6.2 everything worked as expected,
so I used "git bisect" to find the point where the error occurs.
> Otherwise maybe the Micron chip is in fault. Which would mean that
> there are unsupported commands. I believed they were all standard,
> maybe some of them are optional? Could you check in the chip datasheet
> if there is any command used there that is unsupported?
According to the MT29F2G08ABAEAWP datasheet, the chip supports
the READ PAGE CACHE SEQUENTIAL opcode, but with two caveats:
4. These commands supported only with ECC disabled.
5. Issuing a READ PAGE CACHE series (31h, 00h-31h, 3Fh) command
when the array is busy (RDY = 1, ARDY = 0) is supported if the previous
command was a READ PAGE (00h-30h) or READ PAGE CACHE series
command; otherwise, it is prohibited.
As far as I understand, the second remark suits us, since we create
the correct sequence.
But the first remark can be a problem in this case.
> > ...
> > omap-gpmc 50000000.gpmc: GPMC revision 6.0
> > ...
> > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
> > nand: Micron MT29F2G08ABAEAWP
> > nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
> > nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme
> > ...
> > VFS: Mounted root (squashfs filesystem) readonly on device 254:0.
> > devtmpfs: mounted
> > Freeing unused kernel image (initmem) memory: 1024K
> > Run /sbin/init as init process
> > SQUASHFS error: lzo decompression failed, data probably corrupt
> > SQUASHFS error: Failed to read block 0xd291c2: -5
> > SQUASHFS error: lzo decompression failed, data probably corrupt
> > SQUASHFS error: Failed to read block 0xd291c2: -5
> > SQUASHFS error: Unable to read data cache entry [d291c2]
> > SQUASHFS error: Unable to read page, block d291c2, size 14307
> > SQUASHFS error: Unable to read data cache entry [d291c2]
> > SQUASHFS error: Unable to read page, block d291c2, size 14307
> > Kernel panic - not syncing: Attempted to kill init!
> > exitcode=0x00000007 CPU: 0 PID: 1 Comm: init Not tainted 6.3.0+ #105
> > Hardware name: Generic AM33XX (Flattened Device Tree)
> > unwind_backtrace from show_stack+0xb/0xc
> > show_stack from dump_stack_lvl+0x2b/0x34
> > dump_stack_lvl from panic+0xbd/0x230
> > panic from make_task_dead+0x1/0x120
> > make_task_dead from 0xc102ca80
> > ---[ end Kernel panic - not syncing: Attempted to kill init!
> > exitcode=0x00000007 ]---
More information about the linux-mtd
mailing list