mtd: nand: raw: Possible bug in nand_onfi_detect()?

Miquel Raynal miquel.raynal at bootlin.com
Tue May 7 09:08:08 PDT 2024


Hello,

ada at thorsis.com wrote on Wed, 6 Mar 2024 15:36:04 +0100:

> Hello everyone,
> 
> I think I found a bug in nand_onfi_detect() which was introduced with
> commit c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page
> read to constraint controllers") back in 2020.
> 
> Background on how I found this: I'm currently struggling getting raw
> nand flash access to fly with an at91 sam9x60 SoC and a S34ML02G1
> Spansion SLC raw NAND flash on a custom board.  The setup is
> comparable to the sam9x60 curiosity board and can be reproduced with
> that one.
> 
> NAND flash on sam9x60 curiosity board works fine with what is in
> mainline Linux kernel.  However after removing the line 'rb-gpios =
> <&pioD 5 GPIO_ACTIVE_HIGH>;' from at91-sam9x60_curiosity.dts all data
> read from the flash appears to be zeros only.  (I did not add that
> line to the dts of my custom board first, this is how I stumbled over
> this.)
> 
> I have no explanation for that behaviour, it should work without R/B#
> by reading the status register, maybe we investigate that
> in depth later.  However those all zeros data reads happens when
> reading the ONFI param page as well es data read from OOB/spare area
> later and I bet it's the same with usual data.
> 
> This read error reveals a bug in nand_onfi_detect().  After setting
> up some things there's this for loop:
> 
>     for (i = 0; i < ONFI_PARAM_PAGES; i++) {
> 
> For i = 0 nand_read_param_page_op() is called and in my case all zeros
> are returned and thus the CRC calculated does not match the all zeros
> CRC read.  So the usual break on successful reading the first page is
> skipped and for reading the second page nand_change_read_column_op()
> is called.  I think that one always fails on this line:
> 
>     if (offset_in_page + len > mtd->writesize + mtd->oobsize) {
> 
> Those variables contain the following values:
> 
>     offset_in_page: 256
>     len: 256
>     mtd->writesize: 0
>     mtd->oobsize: 0
> 
> The condition is true and nand_change_read_column_op() returns with
> -EINVAL, because mtd->writesize and mtd->oobsize are not set yet in
> that code path.  Those are probably initialized later, maybe with
> parameters read from that ONFI param page?
> 
> Returning with error from nand_change_read_column_op() leads to
> jumping out of nand_onfi_detect() early, and no ONFI param page is
> evaluated at all, although the second or third page could be intact.
> 
> I guess this would also fail with any other reason for not matching
> CRCs in the first page, but I have not faulty NAND flash chip to
> confirm that.

Sorry for the time it took on my side.

Here is a link to another similar report:
https://lore.kernel.org/linux-mtd/DM6PR05MB4506554457CF95191A670BDEF7062@DM6PR05MB4506.namprd05.prod.outlook.com/
And here is a link to the series attempting to fix this:
https://lore.kernel.org/linux-mtd/20240507160546.130255-1-miquel.raynal@bootlin.com/T/#t

Thanks,
Miquèl



More information about the linux-mtd mailing list