[PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case

Boris Brezillon boris.brezillon at free-electrons.com
Tue May 2 02:42:28 PDT 2017


Hi Marc,

On Mon, 24 Apr 2017 10:58:47 +0200
Marc Gonzalez <marc_gonzalez at sigmadesigns.com> wrote:

> [ Trimming CC list ]
> 
> On 22/04/2017 12:40, Pavel Machek wrote:
> 
> > Fix ecc.stats_corrected in empty flash case.
> > 
> > Signed-off-by: Pavel Machek <pavel at denx.de>
> > 
> > ---
> > 
> > This was suggested by Boris Brezillon in another context. Not tested;
> > I don't have the hardware.
> > 
> > diff --git a/drivers/mtd/nand/tango_nand.c b/drivers/mtd/nand/tango_nand.c
> > index 4a5e948..db4bff4 100644
> > --- a/drivers/mtd/nand/tango_nand.c
> > +++ b/drivers/mtd/nand/tango_nand.c
> > @@ -193,6 +193,8 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
> >  						  chip->ecc.strength);
> >  		if (res < 0)
> >  			mtd->ecc_stats.failed++;
> > +		else
> > +			mtd->ecc_stats.corrected += res;
> >  
> >  		bitflips = max(res, bitflips);
> >  		buf += pkt_size;
> >   
> 
> Hello Pavel,
> 
> You may have noticed that ecc_stats.corrected is not updated in
> decode_error_report() which is the main code path, i.e. the path
> that will succeed 99.99% of the time (HW read).
> 
> It turns out that the HW does not report the number of errors
> corrected in a page... Instead it reports two values:
> 1) U = number of errors corrected in the first packet/step
> 2) V = max number of errors corrected in other packets/steps
> 
> Thus, it is not possible to determine the actual number of errors
> corrected in a page (unless V is 0). Otherwise, we just have an
> interval; let n be the number of packets/steps:
> 
> U + V <= corrected errors count <= U + (n-1)*V
> 
> In my opinion, it is better to provide no information than to
> provide incorrect information. Therefore, I did not update
> ecc_stats.corrected in decode_error_report().

Hm, not sure I agree with that. The situation is far from ideal, but
some userspace tools query the number of corrected bits before and
after doing a read operation to report the number of bitflips that
have been correcte. Letting users think there were no bitflips at all is
not a good thing IMO.

> 
> One could argue that updating ecc_stats.corrected in
> check_erased_page() sets the correct value, since the error
> counts are computed in software for each step. But updating
> the value here is IMO pointless if we can't do it in the main
> code path.

Well, it's not pointless if you want to inform the user that some
bitflips were present on the NAND. Yes, the numbers you report
won't be accurate in your case, but at least the user can tell when
bitflips were discovered.

Note that ideally, we should have per-page (or per-block) max-bitflips
information (where max-bitflips is the number returned by
ecc->read_page()), because counting the total number of corrected bits
is certainly not helping when it comes to checking how reliable a NAND
page/eraseblock is.

Regards,

Boris



More information about the linux-mtd mailing list