[PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case

Boris Brezillon boris.brezillon at free-electrons.com
Thu May 4 01:42:16 PDT 2017


On Wed, 3 May 2017 22:04:27 +0200
Pavel Machek <pavel at ucw.cz> wrote:

> Hi!
> On Mon 2017-04-24 10:58:47, Marc Gonzalez wrote:
> > [ Trimming CC list ]
> > 
> > On 22/04/2017 12:40, Pavel Machek wrote:
> >   
> > > Fix ecc.stats_corrected in empty flash case.
> > > 
> > > Signed-off-by: Pavel Machek <pavel at denx.de>
> > > 
> > > ---
> > > 
> > > This was suggested by Boris Brezillon in another context. Not tested;
> > > I don't have the hardware.
> > > 
> > > diff --git a/drivers/mtd/nand/tango_nand.c b/drivers/mtd/nand/tango_nand.c
> > > index 4a5e948..db4bff4 100644
> > > --- a/drivers/mtd/nand/tango_nand.c
> > > +++ b/drivers/mtd/nand/tango_nand.c
> > > @@ -193,6 +193,8 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
> > >  						  chip->ecc.strength);
> > >  		if (res < 0)
> > >  			mtd->ecc_stats.failed++;
> > > +		else
> > > +			mtd->ecc_stats.corrected += res;
> > >  
> > >  		bitflips = max(res, bitflips);
> > >  		buf += pkt_size;
> > >   
> > 
> > Hello Pavel,
> > 
> > You may have noticed that ecc_stats.corrected is not updated in
> > decode_error_report() which is the main code path, i.e. the path
> > that will succeed 99.99% of the time (HW read).
> > 
> > It turns out that the HW does not report the number of errors
> > corrected in a page... Instead it reports two values:
> > 1) U = number of errors corrected in the first packet/step
> > 2) V = max number of errors corrected in other packets/steps
> > 
> > Thus, it is not possible to determine the actual number of errors
> > corrected in a page (unless V is 0). Otherwise, we just have an
> > interval; let n be the number of packets/steps:
> > 
> > U + V <= corrected errors count <= U + (n-1)*V
> > 
> > In my opinion, it is better to provide no information than to
> > provide incorrect information. Therefore, I did not update
> > ecc_stats.corrected in decode_error_report().  
> 
> Well... Having corrected ECC errors is pretty rare, right?

Depends on the NAND chip. On modern SLC NAND chips requiring
ECC of 8bits/512bytes are likely to have frequent bitflips.

> So one
> solution would be to re-compute ECCs in software if we see U or V >
> 0...

Hm, not sure it's worth the trouble for statistics that are anyway
rarely used, and when they are, are only used has a metric to determine
how worn the NAND is.

I'd prefer to see a better user-space interface returning the
max_bitflips information when someone reads from an MTD device (see [1])
rather than trying to fix drivers to return the exact number of
corrected bitflips (which might be impossible for some of them anyway).

[1]http://lists.infradead.org/pipermail/linux-mtd/2016-April/067187.html



More information about the linux-mtd mailing list