ubi on MLC nand flash
Artem Bityutskiy
dedekind1 at gmail.com
Tue Nov 8 16:45:18 EST 2011
On Sun, 2011-11-06 at 12:28 -0800, Mike Dunn wrote:
> > I suggest the following strategy:
> >
> > Upon reading, when errors are detected (and corrected by ecc):
> > - if (nb of errors < ecc capability (*)) then no scrubbing, do nothing
> > - if (nb of errors == ecc capability (*)) then
> > - scrub block, then torture it and compute nb of persistent bitflips
> > - if (nb of persistent errors < ecc capability (*)) then block is OK
> > - if (nb of persistent errors == ecc capability (*)) then mark block as bad
> > [because a single additional bitflip (e.g. a read disturb) would cause
> > data loss]
> >
> > (*) In order to improve reliability, thresholds can be used instead of max ecc
> > capability.
>
>
> One wrinkle is that the torture test is performed over the entire erase block,
> not just the page(s) with the correctible error(s). So the biflip stats are
> cumulative over the entire block, and may not even occur on the same page. The
> current UBI policy for the torture test is that *any* bitflips on *any* page
> following the erasure causes the block to be marked bad.
>
> Another complication is that there's currently no way to accurately determine in
> the UBI code the number of bitflips the read operation caused. Currently the
> occurrence of bitflips (one or more) is determined by the return code from the
> mtd subsystem, which has exclusive access to the device during the read
> operation. Just checking the ecc_stats field in the mtd_info structure could
> include errors in read operations performed by other processes.
What about something like this.
1. MTD knows flash's ECC strength (driver sets it)
2. MTD sets the scrub level = ECC strength by default
3. MTD can expose the scrub level and ECC strength via sysfs and make
the scrub level sysfs file writable, so the user can vary it between
1 and ECC strength.
4. MTD just does not report -EUCEAN if the ECC correction order is
less than the scrub level.
Then you do not need to change UBI at all.
WRT blank pages, I guess MTD can gain some internal smartness as well -
the driver can report to the NAND base that a blank page was read, and
the ECC correction order, then NAND base will make the decision about
reporting -EUCLEAN and setting the buffer to all 0xFFs.
Also, it sounds like this may require re-working the current MTD
interface and turn all these function pointers (mtd->read(), etc) into
normal functions (mtd_read()) which will allow inserting additional
logic at various levels.
WRT ecc_stats - IMHO, it is useless and rudimentary thing and could be
just killed...
Artem.
More information about the linux-mtd
mailing list