state of support for "external ECC hardware"

Calvin Johnson linux.cj at gmail.com
Tue Nov 20 06:13:16 EST 2012


Hi,

I thought of sharing my recent experience with MLC NAND which requires
24-bit ECC.

On Fri, Nov 9, 2012 at 2:16 PM, Ricard Wanderlof
<ricard.wanderlof at axis.com> wrote:
>
> On Thu, 8 Nov 2012, Gerlando Falauto wrote:
>
>>> We had BCH8 code running, but it wasn't enough. The main reason we
>>> switched away from host side ECC was because we were getting bitflips
>>> within the ECC codeword data itself.
>>
>>
>> Wow... I mean, I figured it wouldn't be that easy to (purposedly) get
>> bitflips in any area, I wonder what kind of test you managed to come up with
>> in order to get bitflips within the ECC area itself. In my case it takes
>> several hours (of continuous reads) to get a single bitflip within a 1Gb
>> (128MB) flash.
>
>
> There are 1Gb flashes and 1Gb flashes. Depending on the technology used
> during manufacture (essentially the scale of the on-chip structures, usually
> specified as 'xxx nm technology') the bit error probabilities can vary.
>
> "Traditional" 1Gb flashes where the manufacturer recommends 1-bit ECC in
> practice very rarely exhibit bit flips. I have seen bit flips in the OOB
> area as well as the main area (there was a bug in nand_ecc.c many years ago
> which didn't handle this correctly which is how I discovered what was going
> on); indeed there's nothing different about the OOB area in terms of bit
> flips, it's just another area of (the same type of) flash. The probability
> for the whole OOB area is of course less than for the rest as it is smaller,
> but it is the same per bit if I understand it correctly.
>
> Some manufacturers (Micron for instance I believe) have started to deliver 1
> Gb chips using a higher density technology where they specify a requirement
> for 4-bit ECC. These naturally exhibit a much higher bitflip rate.
>

I'm using Micron's MT29F16G08CBACA.
Minimum required ECC :-      24-bit ECC per 1080 bytes of data
The H/W ECC controller(external to NAND flash) I'm using supports 24-bit ECC.
Had a tough time initially when I started working on this NAND flash.
Without being aware of the minimum required ECC, I was using
Hamming(1-bit) correction. This showed inconsistency at a level of
1/6, i.e 1 boot out of 6 failed.

When I switched to 24-bit ECC with UBIFS, everything seems to work
properly without any issue so far.

But with JFFS2 still there are many issues. I assume that this can be
due to the bit flips in the OOB area which are not covered by ECC.
Also for the erased pages, there is no ECC protection and JFFS2 reads
first 256 bytes of data and checks for all 0xFF to confirm it is an
erased page along with the checking of clean marker it read from the
OOB.

>From various articles in the internet, it seems that NAND flashes are
going to get more denser and the bit flips are going to increase.
Hence the H/W ECC controllers are going to have more demand. The S/W
BCH algorithm available in Linux will consume plenty of cycles which
can be offloaded to the H/W ECC controller.

> At any rate, the ECC algorithm itself should be able to take care of bit
> flips in the ECC codes. For the 1-bit algorithm in nand_ecc.c it does this
> by comparing the computed ECC with the actual ECC; if there's a difference
> of exactly one bit (rather than a more complex diff which after calculations
> points out the flipped bit in the main area), it is assumed that the bitflip
> is in the ECC area rather than the data. I don't know how BCH does this
> though.
>
regards,
Calvin



More information about the linux-mtd mailing list