nand: WARNING: a0000000.nand: the ECC used on your system (1b/256B) is too weak compared to the one required by the NAND chip (4b/512B)
Miquel Raynal
miquel.raynal at bootlin.com
Sat Jun 19 11:40:35 PDT 2021
Hi Christophe,
> >> Now and then I'm using one of the latest kernels (Today is 5.13-rc6), and sometime in one of the 5.x releases, I started to get errors like:
> >>
> >> [ 5.098265] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [ 5.103859] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 60
> >> bytes from PEB 99:59824, read only 60 bytes, retry
> >> [ 5.525843] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [ 5.531571] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [ 5.537490] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
> >> 73 bytes from PEB 107:108976, read only 3073 bytes, retry
> >> [ 5.691121] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [ 5.696709] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [ 5.702426] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [ 5.708141] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [ 5.714103] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
> >> 35 bytes from PEB 107:25144, read only 3035 bytes, retry
> >> [ 20.523689] random: crng init done
> >> [ 21.892130] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [ 21.897730] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 13
> >> 94 bytes from PEB 116:75776, read only 1394 bytes, retry
> >>
> >> Most of the time, when the reading of the file fails, I just have to read it once more and it gets read without that error.
> >
> > It really looks like a regular bitflip happening "sometimes". Is this a
> > board which already had a life? What are the usage counters (UBI should
> > tell you this) compared to the official endurance of your chip (see the
> > datasheet)?
>
> The board had a peacefull life:
>
> UBI reports "ubi0: max/mean erase counter: 49/20, WL threshold: 4096"
Mmmh. Indeed.
>
> I have tried with half a dozen of boards and all have the issue.
>
> >
> >> What am I supposed to do to avoid the ECC weakness warning at startup and to fix that ECC error issue ?
> >
> > I honestly don't think the errors come from the 5.1x kernels given the
> > above logs. If you flash back your old 4.14 I am pretty sure you'll
> > have the same errors at some point.
>
> I don't have any problem like that with 4.14 with any of the board.
>
> When booting a 4.14 kernel I don't get any problem on the same board.
>
If you can reliably show that when returning to a 4.14 kernel the ECC
weakness disappears, then there is certainly something new. What driver
are you using? Maybe you can do a bisection?
> >
> > NAND really is a fragile storage medium, not following in a production
> > environment the minimum ECC scheme (there is a real difference between
> > 1/256 vs 4/512) really leads to complicated solutions like this one,
> > unfortunately...
>
> I see kernel has "Software BCH ECC". Should I use that with that chip ?
>
> If yes, how do I use it ? Seems like selecting the option at Kernel build is not enough, do I have to configure something somewhere, for instance in the device tree ? At the time being I have the following in the device tree:
Enabling software BCH in the configuration will just built-in the
support. You then need to follow the NAND controller bindings, see the
example in [1].
However, given all the data you provided, I know think that there is
something weird happening in the driver you use, it might be relevant
to try to understand what.
[1] Documentation/devicetree/bindings/mtd/nand-controller.yaml
Thanks,
Miquèl
More information about the linux-mtd
mailing list