[PATCH 8/8] mtd: nand: use ECC, if present, when scanning OOB

Mon Jul 16 17:34:02 EDT 2012

On Mon, Jul 16, 2012 at 08:36:56PM +0200, Mike Dunn wrote:
> Hi Ivan, thanks again for the comments.
> 
> On 07/16/2012 07:01 AM, Ivan Djelic wrote:
> > On Sun, Jul 15, 2012 at 10:01:24PM +0200, Mike Dunn wrote:
> >
> >> Yeah, this is a strong argument for ecc on oob-only reads.
> > 
> > Hello Mike,
> > 
> > I think it is a strong argument for a robust reading of BBM, rather than an argument
> > for ECC on OOB-only reads. By "robust reading", I mean simply looking at the Hamming weight of the
> > marker (the number of 1s in the BBM) rather than its value, as done in nand_block_bad() by setting chip->badblockbits.
> > 
> > This robust reading is trivially implemented, does not depend on OOB ecc availability,
> > and benefits all drivers. Even if your driver implements OOB ECC, it may not work
> > on an erased block with a bitflip in its BBM (because erased data may not have a valid
> > ECC). 
> 
> 
> This is a certainty, no?  Erased, by definition, includes any ecc bytes.

If you add the right polynomial[1] to your Hamming or BCH code, then you can make
sure the ECC of an erased page (possibly with OOB bytes) is actually a sequence of 0xff bytes.
This trick enables bitflip correction on data that has never been programmed.
Take a look at nand_ecc.c:63 (or nand_bch.c:62) for an example.
This trick is not possible on all hardware ECC engines.

> 
> > Moreover, reading just the OOB region with ECC may require a full page read on some drivers
> > (when OOB and data are parts of the same codeword).
> > 
> > To me, the only strong reason for wanting OOB ECC is the implementation of YAFFS2 or similar filesystems
> > which require OOB metadata protection. But maybe I'm missing some other use cases ?
> > 
> > What do you think ?
> 
> 
> If we assume the oob bytes on the first page of a good block can contain
> anything, won't simply counting the bits make the risk of falsly identifying a
> bb marker unacceptably high?

Counting bits on a BBM marker byte basically gives a 4-bitflip protection on a _single_ byte, which is
*extremely* unlikely. If you assume an erased good block can contain garbage in its OOB region, then
it can indeed be wrongly identified as a bad block; but this is also true if you use OOB ECC
(e.g. exceeding correction capacity).
In practice, since the bad block marker byte is normally never programmed to anything other than 0xff,
there is no reason why we should find garbage in it (even if a power failure occurs during an
erase/program operation).
BR,

-- 
Ivan

[1] this polynomial is simply the inverted ecc of an erased ecc block