state of support for "external ECC hardware"
ivan.djelic at parrot.com
Thu Nov 8 13:59:42 EST 2012
On Thu, Nov 08, 2012 at 03:21:25PM +0000, Christopher Harvey wrote:
> We had BCH8 code running, but it wasn't enough. The main reason we
> switched away from host side ECC was because we were getting bitflips
> within the ECC codeword data itself.
But the ECC bytes are part of the BCH codeword, therefore I don't understand
what the issue could be ? Are you sure bitflips were not in some unprotected
OOB area ?
Yes, it would have been possible
> to add a 1 byte hamming code to protect the main ECC data, but it was
> just easier to say, "hey, Micron knows their hardware, so we'll trust
> their algorithms", and enable the Micron ECC hardware. Although it
> didn't require too much work to enable it's all a total hack. I took
> the code that runs the "ECC disabled mode", and sprinkled in some
> extra init code and error checking code. Would be nice to add an
> "external ecc mode" to support these chips explicitly.
> > Support for software-based multiple-bit-resilient ECC mechanism (BCH)
> > was posted (http://lwn.net/Articles/426856/) by Ivan Djelic (which I
> > took liberty to Cc:) and merged in March last year.
> > I haven't been able to track how the situation evolved, but apparently
> > you need to enable it (in addition to within the kernel configuration),
> > also within your flash controller setup.
> > Micron gives an example of how to enable it on a sample NAND host
> > controller S3C6410 in this TN (rest of the code, mainly from the above
> > patch, would be already present in recent kernels):
> > http://www.micron.com/~/media/Documents/Products/Technical%20Note/NAND%20Flash/tn2971_software_bch_ecc_on_linux.pdf
> I haven't looked into current software ECC algorithms in the
> kernel. Do the protect against corrupted ECC data? As in, corruptions
> in the out of bounds area?
Yes, BCH ECC works by generating a codeword containing data+ecc bytes.
Errors can be detected and corrected in any location of the codeword (data and ecc).
Note that in practice, we are interested in actually fixing errors in data only (not ecc).
When an error is detected in ECC bytes, it must simply be reported to trigger block scrubbing.
The current software BCH implementation in MTD protects the page data area (and ecc bytes).
It does not protect additional bytes in the OOB area (like the Micron on-die ECC does),
but since the BCH library is not limited to any particular size, a simple patch could achieve this.
More information about the linux-mtd