ubi on MLC nand flash
Mike Dunn
mikedunn at newsguy.com
Tue Nov 8 22:04:23 EST 2011
On 11/08/2011 01:45 PM, Artem Bityutskiy wrote:
> On Sun, 2011-11-06 at 12:28 -0800, Mike Dunn wrote:
>>> I suggest the following strategy:
>>>
>>> Upon reading, when errors are detected (and corrected by ecc):
>>> - if (nb of errors < ecc capability (*)) then no scrubbing, do nothing
>>> - if (nb of errors == ecc capability (*)) then
>>> - scrub block, then torture it and compute nb of persistent bitflips
>>> - if (nb of persistent errors < ecc capability (*)) then block is OK
>>> - if (nb of persistent errors == ecc capability (*)) then mark block as bad
>>> [because a single additional bitflip (e.g. a read disturb) would cause
>>> data loss]
>>>
>>> (*) In order to improve reliability, thresholds can be used instead of max ecc
>>> capability.
>>
>> One wrinkle is that the torture test is performed over the entire erase block,
>> not just the page(s) with the correctible error(s). So the biflip stats are
>> cumulative over the entire block, and may not even occur on the same page. The
>> current UBI policy for the torture test is that *any* bitflips on *any* page
>> following the erasure causes the block to be marked bad.
>>
>> Another complication is that there's currently no way to accurately determine in
>> the UBI code the number of bitflips the read operation caused. Currently the
>> occurrence of bitflips (one or more) is determined by the return code from the
>> mtd subsystem, which has exclusive access to the device during the read
>> operation. Just checking the ecc_stats field in the mtd_info structure could
>> include errors in read operations performed by other processes.
> What about something like this.
>
> 1. MTD knows flash's ECC strength (driver sets it)
> 2. MTD sets the scrub level = ECC strength by default
> 3. MTD can expose the scrub level and ECC strength via sysfs and make
> the scrub level sysfs file writable, so the user can vary it between
> 1 and ECC strength.
> 4. MTD just does not report -EUCEAN if the ECC correction order is
> less than the scrub level.
>
> Then you do not need to change UBI at all.
That sounds reasonable, but the changes seem broadly consequential.
> WRT blank pages, I guess MTD can gain some internal smartness as well -
> the driver can report to the NAND base that a blank page was read, and
> the ECC correction order, then NAND base will make the decision about
> reporting -EUCLEAN and setting the buffer to all 0xFFs.
I haven't yet surveyed the other drivers regarding ecc and blank page reading.
I assumed that ecc was disregarded for blank pages, but probably some drivers
are more thoughtful about it than I originally was.
> Also, it sounds like this may require re-working the current MTD
> interface and turn all these function pointers (mtd->read(), etc) into
> normal functions (mtd_read()) which will allow inserting additional
> logic at various levels.
Oofa. What have I gotten myself into? I don't have all those devices on which
to test the changes, and I'd hate to break a driver. But you're right. Both
mtd and nand interfaces would have to change to provide a mechanism for
returning an error count (corrected or uncorrected) to some
yet-to-be-implemented mtd infrastructure code. Drivers that don't use the NAND
interface currently return -EUCLEAN directly to the higher layer (e.g. UBI).
For drivers using the nand interface, nand_base.c handles it.
> WRT ecc_stats - IMHO, it is useless and rudimentary thing and could be
> just killed...
Some userspace mtd-utils for nand currently use it, though.
I'm able to at least look into making these changes if you want to go ahead. My
motivation is to get a robust ubifs on my diskonchip G4.
Thanks,
Mike
More information about the linux-mtd
mailing list