CONFIG_MTD_NAND_VERIFY_WRITE with Software ECC

Fri Feb 25 06:36:09 EST 2011

On Fri, Feb 25, 2011 at 10:29:22AM +0000, Artem Bityutskiy wrote:
(...)
> Currently the mechanism to mark a block is bad is the torture function
> failure: we write a pattern, read it back, compare, and do this several
> times with different patterns. In case of any error in any step, or if
> we read back something we did not write, or even if we get a bit-flip
> when we read back the data, we bark the eraseblock as bad. Otherwise it
> is returned to the pull of free eraseblocks.
> 
> See torture_peb() in drivers/mtd/ubi/io.c
> 
> This procedure is not ideal, and could be improved:
> 
> a) we could store amount of times the eraseblock was tortured. Since we
> torture only if there was a write error, too many torture session would
> indicate that the eraseblock is unstable.
> b) we could take into account the erase count somehow.
> 
> But yes, the threshold would probably set up by the system designer at
> the end.

The fact that a bitflip detected during torture is enough to decide that a
block is bad causes problems on some 4-bit ecc devices we are using. If we
stick to this policy, we end up with a _lot_ of blocks being marked as bad
(i.e. way too many).

Our NAND manufacturer tells us that, as long as a block erase operation
completes without a failure reported by the device, it should not be classified
as bad, even if it has bitflips (which sounds risky at best).

Right now, we implement a bitflip threshold, below which we correct ecc errors
without reporting them. When the bitflip threshold is reached, we report the
amount of corrected errors, triggering block scrubbing, etc.
This is not ideal, but it prevents UBI from torturing and marking too many
blocks as bad.
Regards,

Ivan