NAND ECC capabilities
Ezequiel Garcia
ezequiel at vanguardiasur.com.ar
Wed Jan 7 20:17:49 PST 2015
On 01/08/2015 12:10 AM, Steve deRosier wrote:
> So, doing further experiments and I wondered if someone could confirm
> this finding.
>
> With atmel_nand, we're setup for 4-bit ECC on 512 sectors with a 2k
> page. I was thinking about this a bit and realized that there's 4 of
> these sectors per page, and this implies then that we can detect and
> correct 4 bad bits _per_ each sector. Assuming that they're evenly
> spread, that's up to 16 bad bits per page. Obviously in practice,
> that assumption wouldn't hold...
>
Not sure why you say that woulnd't hold.
> So, is my understanding correct?
>
I'm not familiar with atmel-nand, but as far as I know, you are right. A
4-bit ECC strength over 512 byte sectors, means exactly that.
Most likely your ECC hardware stores four ECC values (one for each
512-byte sector in your 2048-byte page) in the OOB of the page. Each ECC
value is used to correct up to 4-bit on each sector, so that's why you
can correct as much as that.
> I took it further and decided to play with this experimentally. On my
> UBIFS rootfs, I flipped 3 bits in the first sector of a page and then
> 3 more in the second sector. From my kernel log I got this:
>
> [ 78.304687] atmel_nand 40000000.nand: Bit flip in data area,
> byte_pos: 98, bit_pos: 3, 0x31 -> 0x39
> [ 78.304687] atmel_nand 40000000.nand: Bit flip in data area,
> byte_pos: 98, bit_pos: 2, 0x39 -> 0x3d
> [ 78.304687] atmel_nand 40000000.nand: Bit flip in data area,
> byte_pos: 98, bit_pos: 1, 0x3d -> 0x3f
> [ 78.304687] atmel_nand 40000000.nand: Bit flip in data area,
> byte_pos: 530, bit_pos: 6, 0x8e -> 0xce
> [ 78.304687] atmel_nand 40000000.nand: Bit flip in data area,
> byte_pos: 530, bit_pos: 5, 0xce -> 0xee
> [ 78.304687] atmel_nand 40000000.nand: Bit flip in data area,
> byte_pos: 530, bit_pos: 4, 0xee -> 0xfe
> [ 78.304687] UBI: fixable bit-flip detected at PEB 20
> [ 78.304687] UBI: schedule PEB 20 for scrubbing
> [ 78.328125] atmel_nand 40000000.nand: Bit flip in data area,
> byte_pos: 98, bit_pos: 3, 0x31 -> 0x39
> [ 78.328125] atmel_nand 40000000.nand: Bit flip in data area,
> byte_pos: 98, bit_pos: 2, 0x39 -> 0x3d
> [ 78.328125] atmel_nand 40000000.nand: Bit flip in data area,
> byte_pos: 98, bit_pos: 1, 0x3d -> 0x3f
> [ 78.328125] atmel_nand 40000000.nand: Bit flip in data area,
> byte_pos: 530, bit_pos: 6, 0x8e -> 0xce
> [ 78.328125] atmel_nand 40000000.nand: Bit flip in data area,
> byte_pos: 530, bit_pos: 5, 0xce -> 0xee
> [ 78.328125] atmel_nand 40000000.nand: Bit flip in data area,
> byte_pos: 530, bit_pos: 4, 0xee -> 0xfe
> [ 78.343750] UBI: fixable bit-flip detected at PEB 20
> [ 78.382812] UBI: scrubbed PEB 20 (LEB 0:18), data moved to PEB 250
>
> So, my takeaway from this is a couple of things:
>
> 1. Yes, it can correct more than 4 bits per page as long as those are
> on different sectors of the page.
Correct. It can correct as much as advertised: 4-bits per 512-byte sector.
> 2. My test of 6 bits hit the 4 bit threshold setting and at that point
> UBI decided that maybe something is wrong with that PEB.
Correct. Read/program disturb accumulates and that produces bitflips.
Given these bitflip can be eliminated by erasing the block, UBI will do
that before the block get worse.
> 3. When it did, UBI corrected the data and copied it elsewhere
Actually, your NAND controller (or MTD software ECC) corrected the data
and reported the number of bitflips to UBI.
> 4. Then UBI scrubbed. I assume it then did the torture test. Since I
> manually made a flip, it found it was fine once it erased it, so it
> didn't mark it as bad. I checked my BBT and it's not marked. So I
> assume it's erased and ready for use again.
>
Yes, UBI tortures the PEB on occassions. However, this does happen only
under certain circumstances (you'll have to dig the code for details). I
don't think it was tortured in your case (the block just had a few
artifitial bitflips, but other than that it was healthy).
Torture comes with a noisy message "run torture test for PEB %d", so you
would notice.
> Is my general understanding correct?
>
I think so, yes.
--
Ezequiel Garcia, VanguardiaSur
www.vanguardiasur.com.ar
More information about the linux-mtd
mailing list