Does UBIFS NAND ECC info get stored in OOB?

Thu Jan 8 21:05:45 PST 2015

Hi Josh,

On Sat, Jan 3, 2015 at 7:52 PM, Josh Wu <josh.wu at atmel.com> wrote:
> Hi, Steve
>
> On 1/3/2015 2:06 AM, Steve deRosier wrote:
>>
>
> There seems has some UBI fix on 3.8.x stable tree. It is better if you can
> apply these fixes.
>
> ➜  mainline git:(99f3cd5) ✗  git log --oneline v3.8..v3.8.13 | grep -i UBI
> 1afae69 UBIFS: make space fixup work in the remount case
> d90dc15 UBIFS: fix double free of ubifs_orphan objects
> ce7f4e8 UBIFS: fix use of freed ubifs_orphan objects

Will do!  I had pulled in a number of other upstreamed fixes but these
must be newer than last time I looked. Thanks!

>
> For at91sam9x5ek PMECC, we cannot do pmecc correction for the erased
> page(all 0xff) if there has some bit flips.
> The reason is 9x5ek PMECC will generate non-0xff ecc code for the erased
> page(all 0xff in the page).
>
> This will case issues:
> 1. if there is any bitflip happen in erased page's oob area, that will cause
> PMECC error.
> 2. if there is any bitflip happen in erased pages' data area, This bitflip
> cannot be correct. And driver won't report any ECC error. I am not sure
> whether this can cause problem? As the UBI  may record the erased page, so
> the data corruption maybe doesn't matter. When UBI write data to this
> bitfliped erased page, as the PMECC code will write correctly into oob area.
> So this bitflip can be corrected by PMECC hardware.
>
> I think you can manually insert bitflip into the erased page to see whether
> this cause your issue.

Well, our issue is clearly caused by the use of `nandflash -n`.
Moving to ubiformat fixes it.

But, what you pointed out made me interested in a few more problem scenarios:

1. Bitflip in ECC data of a valid data page
2. Bitflip in data area of an erased page
3. Bitflip in the ECC data of an erased page.

So I tried them.  I was hoping for the best and fearing the worst.
Thankfully I effectively got the best.
1. This was the scary one for me. But, it seems that this is handled
nicely by the ECC process. dmesg printed:
    atmel_nand 40000000.nand: Bit flip in OOB, oob_byte_pos: 48,
bit_pos: 0, 0xec -> 0xed
This is awesome, it found the flip, identified where it was and fixed it. Yay.

Both 2 and 3 were non-events.  As near as I could tell, UBIFS and the
MTD system ignored those. I have some special code that noticed it,
but none of the stock stuff did.  Writing and reading data there
worked fine.  And, I'd expect that if the flip caused a flip in data
that was written and later corrected, it would be fine.

>
> These seems ok.
> Be caution: if you use 1024 as sector size, you need apply the fix:
> 2fa831f9db1f <mtd: atmel_nand: pmecc: fix failure to correct bit error in
> 1024-bytes sector>
>

Thanks for the heads up on this fix.  We're using 512, but after
reading some stuff, I'm thinking that going to 1024 might make some
sense, so I might need that.

Thanks,
- Steve