CONFIG_MTD_NAND_VERIFY_WRITE with Software ECC

Fri Feb 25 03:31:27 EST 2011

Hi,

On Tue, 2011-02-15 at 14:00 +0000, David Peverley wrote:
> From the description of read disturb, it occurs due to many reads
> (hundreds of thousands or millions) prior to an erase. Currently my
> testing is using nandtestc.c and mtd_stresstest.ko - the former tests
> one cycle before re-programming and the latter is random but not
> expected to be more than tens of reads before a re-programme becomes
> statistically likely. Potentially program disturb sounds like it
> _could_ be the behaviour I observe but it's not clear.

If you verify the page just after you have programmed it, then program
disturb is out of the picture. AFAIK, program disturb is about
"disturbing" other NAND pages while programming.

> My general take on this is that only the permanent type failures i.e.
> those involving permanently stuck bits, require marking as bad blocks.
> The recovery recommended for the other scenarios is always to erase
> and re-programme. This potentially opens up a whole can of worms... My
> interpretation of this is that if we verify a write and we've had a
> (correctable and non-permanent) single bit error the Right Thing To Do
> would be to erase and re-programme the block, probably with a very
> small retry limit. We could argue that it's the responsibility of the
> file-system to do this but programatically I think nand_write_page()
> is best placed to be able to do this.

Yeah, UBI does this for example. If we program an eraseblock, and we get
and error while writing a NAND page, we try to recover:

1. We pick another eraseblock.
2. We move the data from the faulty eraseblock to the new one.
3. We erase and torture the faulty eraseblock.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)