[PATCH 1/2] mtd: nand: add erased-page bitflip correction

Huang Shijie b32955 at freescale.com
Thu Mar 13 04:04:27 EDT 2014


On Wed, Mar 12, 2014 at 09:55:19PM -0700, Brian Norris wrote:
> Hi Huang,
> 
> On Wed, Mar 12, 2014 at 04:06:07PM +0800, Huang Shijie wrote:
> > On Tue, Mar 11, 2014 at 02:11:51AM -0700, Brian Norris wrote:
> > > +	/* Check each ECC step individually */
> > > +	for (i = 0; i < chip->ecc.steps; i++) {
> 
> [...]
> 
> > The gpmi uses a NAND layout like this:
> > 
> > 	 *    +---+----------+-+----------+-+----------+-+----------+-+
> > 	 *    | M |   data   |E|   data   |E|   data   |E|   data   |E|
> > 	 *    +---+----------+-+----------+-+----------+-+----------+-+
> > 
> >         M : The metadata, which is 10 by default.	 
> >  	E : The ECC parity for "data" or "data + M".
> > 
> > If you want to count each sector, it makes the code very complicated. :(
> 
> I see. That does make things complicated.
> 
> But I'm curious: why does GPMI's ecc.read_page_raw() implementation (the
> default) have to give such a different data/spare layout than its
> ecc.read_page() implementation? There seems to be no benefit to this,
> except that it means gpmi-nand doesn't have to implement its own
> read_page_raw() callback...

The raw data of the NAND is like the diagram above.

Do you mean i should implement the ecc.read_page_raw to read out all
the "data"s in the diagram above, and concatenate them together? 

What is the "raw" data meaning? I feel confused now.


> 
> Look, for instance, at nand_base's nand_read_page_raw_syndrome(). It
> actually bothers to untangle a layout like yours and place it into the
> appropriate places in the data and OOB buffers. I think this is the
> ideal implementation for read_page_raw(), at which point my patch makes
> more sense; all NAND drivers should have a (reasonably) similar layout
> when viewed from nand_base, without any mixed data/OOB striping like in
> your diagram.
We'd better add more comment in the:
 * @read_page_raw:	function to read a raw page without ECC

Make it clear that the @read_oob_raw read out the data which do not contain
the ECC parity. 

> 
> Side topic: did you plan on looking at this FIXME in gpmi-nand.c?
> 
>  * FIXME: The following paragraph is incorrect, now that there exist
>  * ecc.read_oob_raw and ecc.write_oob_raw functions.
> 
I should remove these stale comments.

> It might be worth addressing at the same time, since the stale comments
> near the FIXME talk about a difference between a raw and an ECC-based
> "view" of the page. IMO, there should be no difference between these
The difference is caused by the block marker swap. That is maybe a different
topic from this one.

> types of "view", if at all possible. The only difference should be that
> one tries to apply ECC and the other doesn't.
> 
> > > @@ -1554,6 +1610,18 @@ read_retry:
> > >  				break;
> > >  			}
> > >  
> > > +			/* Check an ECC error for 0xff + bitflips? */
> > > +			if ((chip->options & NAND_ECC_NEED_CHECK_FF) &&
> > > +					mtd->ecc_stats.failed - ecc_failures) {
> > > +				ret = nand_verify_erased_page(mtd, bufpoi,
> > > +						page);
> > I think we can merge this code into the read-retry check like:
> 
> There's some very specific ordering requirements for some of the steps,
> so we'd have to be careful about moving this code downward (max_bitflips
> checks; the !aligned buffer copy; nand_transfer_oob()). I'm not sure
> your suggestion is possible. And now that I mention it, I think the
> NAND_NEED_READRDY check might already be a bit misplaced in my patch...

okay.


> 
> > 	---------------------------------------------
> > 			if (mtd->ecc_stats.failed - ecc_failures) {
> > 
> > 				// step 1: count the bitflips, assume N.
> > 				N = bitflips_count();
> > 				// step 2:  
> > 				if (N < ecc_strength) {
> > 					do the erase-page-check;
> > 				} else
> > 					do the read-retry.
> 
> I'm not 100% clear on the pseudocode here, but it looks like you're
> saying erased-page checks and read-retry would be mutually exclusive. I
> believe that is wrong. In fact, they may *both* be required; we should

yes, that is what i meant. they both are required.
But could we find a way to know that it is _NOT_ an erased-page-ECC-fail?

> first check if the ECC failure was just an erased page, and if not, we
> should retry the page.
My pseudocode code is not accurate enough.
Maybe it should like this:
                    if (is_NOT_an_erase_page_failure()) {
			    do the read-retry;
		    } else {
			   if (check_erase_page_ok())
				   /* nothing */;
			   else
				do the read-retry;
		    }

thanks
Huang Shijie




More information about the linux-mtd mailing list