nand: WARNING: a0000000.nand: the ECC used on your system (1b/256B) is too weak compared to the one required by the NAND chip (4b/512B)

Miquel Raynal miquel.raynal at bootlin.com
Sat Jun 19 11:40:35 PDT 2021


Hi Christophe,

> >> Now and then I'm using one of the latest kernels (Today is 5.13-rc6), and sometime in one of the 5.x releases, I started to get errors like:
> >>
> >> [    5.098265] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [    5.103859] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 60
> >>    bytes from PEB 99:59824, read only 60 bytes, retry
> >> [    5.525843] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [    5.531571] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [    5.537490] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
> >> 73 bytes from PEB 107:108976, read only 3073 bytes, retry
> >> [    5.691121] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [    5.696709] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [    5.702426] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [    5.708141] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [    5.714103] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
> >> 35 bytes from PEB 107:25144, read only 3035 bytes, retry
> >> [   20.523689] random: crng init done
> >> [   21.892130] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [   21.897730] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 13
> >> 94 bytes from PEB 116:75776, read only 1394 bytes, retry
> >>
> >> Most of the time, when the reading of the file fails, I just have to read it once more and it gets read without that error.  
> > 
> > It really looks like a regular bitflip happening "sometimes". Is this a
> > board which already had a life? What are the usage counters (UBI should
> > tell you this) compared to the official endurance of your chip (see the
> > datasheet)?  
> 
> The board had a peacefull life:
> 
> UBI reports "ubi0: max/mean erase counter: 49/20, WL threshold: 4096"

Mmmh. Indeed.

> 
> I have tried with half a dozen of boards and all have the issue.
> 
> >   
> >> What am I supposed to do to avoid the ECC weakness warning at startup and to fix that ECC error issue ?  
> > 
> > I honestly don't think the errors come from the 5.1x kernels given the
> > above logs. If you flash back your old 4.14 I am pretty sure you'll
> > have the same errors at some point.  
> 
> I don't have any problem like that with 4.14 with any of the board.
> 
> When booting a 4.14 kernel I don't get any problem on the same board.
> 

If you can reliably show that when returning to a 4.14 kernel the ECC
weakness disappears, then there is certainly something new. What driver
are you using? Maybe you can do a bisection?

> > 
> > NAND really is a fragile storage medium, not following in a production
> > environment the minimum ECC scheme (there is a real difference between
> > 1/256 vs 4/512) really leads to complicated solutions like this one,
> > unfortunately...  
> 
> I see kernel has "Software BCH ECC". Should I use that with that chip ?
> 
> If yes, how do I use it ? Seems like selecting the option at Kernel build is not enough, do I have to configure something somewhere, for instance in the device tree ? At the time being I have the following in the device tree:

Enabling software BCH in the configuration will just built-in the
support. You then need to follow the NAND controller bindings, see the
example in [1].

However, given all the data you provided, I know think that there is
something weird happening in the driver you use, it might be relevant
to try to understand what. 

[1] Documentation/devicetree/bindings/mtd/nand-controller.yaml

Thanks,
Miquèl



More information about the linux-mtd mailing list