[PATCH] mtd: nand: Add support for Micron on-die ECC controller (rev2).

Tue Apr 1 01:36:44 PDT 2014

On Fri, Mar 28, 2014 at 09:52:37AM -0600, David Mosberger wrote:
> On Thu, Mar 27, 2014 at 12:56 AM, Gupta, Pekon <pekon at ti.com> wrote:
> 
> >>+      set_on_die_ecc(mtd, chip, 1);
> >>+
> >>+      chkoob = chkbuf + mtd->writesize;
> >>+      rawoob = rawbuf + mtd->writesize;
> >>+      eccpos = chip->ecc.layout->eccpos;
> >>+      for (i = 0; i < chip->ecc.steps; ++i) {
> >>+              /* Count bit flips in the actual data area: */
> >>+              flips = bitdiff(chkbuf, rawbuf, chip->ecc.size);
> >>+              /* Count bit flips in the ECC bytes: */
> >>+              for (j = 0; j < chip->ecc.bytes; ++j) {
> >
> > You should check bit-flips in complete OOB region (mtd->oobsize) not just ecc.bytes.
> 
> I was under the impression that OOB data bytes cannot be assumed to be
> ECC protected.
> As it happens, when using Internal ECC on those Micron chips, *some*
> of the OOB databytes
> are ECC protected, but I didn't think it was necessary to count those
> for bitflips, since OOB users
> won't assume ECC protection anyhow.  Am I wrong about that?

This is a touchy subject. Most of your comments are correct;
traditionally, ECC did not protect OOB, and some of the main users of it
(like JFFS2) assume that it isn't. This is patently false on some modern
systems, which do protect it.

But for this case, the max_bitflips count is used for determining when
the error rate is unacceptably high, not just to see whether your data
is currently corrupt. So if extra bitflips in this (otherwise
unimportant) spare area might eventually cause the ECC hardware to
report an error, then we need to count them.

Brian