[PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
Peter Barada
peter.barada at gmail.com
Fri Mar 16 18:57:53 EDT 2012
On 03/16/2012 05:54 PM, Shmulik Ladkani wrote:
>
> So question is, would you consider 4 bit errors in the first ECC portion
> to be "a dangerously high number of bit errors" as what's reported to
> the MTD users?
> If so, then yes, the cleaning decision should be according to the ecc
> step level, not at the page reading level.
If you had a ECC method that could correct N bits over the entire page
and the ECC showed N-1 bits needed correcting then it should be obvious
that the page is in danger of becoming uncorrectable. This should be
the same as if there are multiple ECC steps per page and a single step
shoes N-1 bits that need correcting. I think the indication from MTD
should be the worst case found in all the ECC steps...
The bigger issue is how to discern whether the degredation is due to
read-disturb (which can be recovered by erasing/reprogramming the block)
or the page physically wearing out (in which case it needs to be
retired). For first generation SLC parts with large geometries this was
relatively straightforward where the block didn't show *any* any
bitflips up until it got close to its wear limit. With smaller geometry
SLC (and definitely with MLC) things are not straightfoward.
In discussions with at least one NAND manufacturer, they indicated that
the "proper" method is to track reads per block (somehow across power
cycles) and when the number of reads per block (after an erasure of the
block) hits a limit then refresh the block, *and* disregard statistical
counting of bit flips - the read patterns across pages/blocks can affect
the number of bitflips seen - apparently it has to do with how the
physical geometry of the cells are laid out (due to the address lines
that are energized that exist nearby, but no details for the part in
question were provided).
Unfortunately there's no current method (that I know of) in MTD to keep
a non-volatile count of reads of pages within a block between erases
that can be used to handle the read-disturb case. If such existed (and
kept track of erase counts) then it should be possible to handle both
cases. Then a NAND manufacturer's rating of "at temperature range M, N
year retention, you can get X UBER if limt reads to Y thousands of
reads/block, and Z thousands of erasures" would be tractable...
--
Peter Barada
peter.barada at gmil.com
More information about the linux-mtd
mailing list