UBI wear leveling / torture testing algorithms having trouble with MLC flash
drambo at broadcom.com
Mon Apr 12 21:00:19 EDT 2010
I am using a Samsung 2GB MLC flash with 4K page size and 8-bit
hardware ECC support. I am using a 2.6.31 kernel, but if I read the
code right, the problem I see is in the latest 2.6.34-rc2 code as well.
I traced a torture test caught in a repeating loop, wearing out a
flash block. It appears that the torture test with it's simple
0xA5/0x5A/0x00 pattern is passing, whereas actual application data
is consistently showing a bit flip on the same block. MLC flashes I
have used regularly show corrected errors over several blocks in time,
and also show corrected errors consistently due to program-disturb
effects, but these program disturb effects appear to be data dependent,
based on device geometry (e.g. adjacent multi-bit cell effects). The
scenario above can cause an repeating scrubbing / torture test loop
to occur. It is initiated by the nand driver reporting a bit correction
in a block which starts the scrubbing/torturing loop.
(Please correct me if I have got ubi parts of this explanation wrong - Thanks).
A single bit correction is a relatively normal event with MLC flash and
this triggers a scrub operation in UBI in which the block is considered
marginal, and the data is moved from a marginal block to a good block. A
good block is one that is considered to have passed the UBI torture test,
which simply writes 0xA5, 0x5A, and 0x00 to a block with 3 erases, and
then does a 4th erase for the user data that will eventually be copied to
this block. But this torture test is too simplistic since it doesn't
understand the geometry of MLC flashes and in particular, write-disturb
and read-disturb effects from neighboring cells and columns. What's maybe
needed is a torture test that understands the geometry and the bit correction
level 1,4,8,12 etc and is able to toggle geometrically neighboring bits rather
than view it as a simple memory test, but this may be difficult. Perhaps
torturing MLC blocks with only 3000 cycles is inappropriate anyways, why not
just trust the error correction and only mark uncorrectable blocks as bad?.
Since the torture test passes with no bit errors I believe that it didn't
find certain error correction patterns that appear with actual application
data. As a result, when the torture test passes, the block shows no correctable
errors, and is returned to the free pool. The original block X that had a
correctable error is scrubbed and it's data is copied to the free block with
the highest erase count. The highest erase count block is chosen and will remain
partially written and available for new scrubs, and data will be scrubbed to it
from other marginal blocks while the other blocks catch up in erase count. However,
if block Y previously had a bit error and was torture tested and passed, it's
erase count will go up by 4. As a result, block Y has a higher probability of
being the target of the copy from X to Y. But when the copy of actual application
data is done, block Y now shows a data-dependent correctable bit error again and
the scrub operation fails, and block Y is again sent out for torture testing, and
the process repeats itself until block Y either shows no error or the block wears
out, is marked bad, and another free block is chosen. So not only do blocks wear
out, but many blocks can be taken out of service as well by being marked bad. If
the available pool of free blocks for bad block management (typically 1% of the
total blocks in a partition) go bad, then the file system is remounted read-only
and the product becomes unusable for writable data in that partition. If I have a
3000 erase MLC part, then only 750 quick loops of the scrub/torture (4 erase) cycle
will wear the block out. At that point the next free block will be used which may
or may not show a corrected error.
I think that with MLC flashes, we perhaps shouldn't be so aggressive to scrub/torture
given the more frequent error rate than we see with SLC parts. Also hiding the actual
corrections in the nand driver to hide the problem in UBI isn't a good long term
solution easier. The UBI algorithm doesn't appear to understand different levels
of correction possible. For example, on a 12-bit part, an single corrected error
is much more common and normal than say, 11 bit corrections on a marginal block,
but UBI treats 1 or 11 bits the same algorithmically. The ecc correction capability
maybe should be known by UBI so it can make better decisions about when to run
scrub/torture algorithms? Or perhaps we don't do torturing at all with MLC flash and
rely on the multi-bit correction instead to help us. The correction level and the
flash type could be useful information for UBI to use in deciding algorithms like these.
The short term solution might be for the nand driver to hide error corrections from
the UBI wear leveling software by reporting all good corrections as 0 errors. Then
UBI shouldn't start scrubbing and torturing on a single bit error in a block. Two
blocks with a single error that are part of the scrubbing/torturing duo start the
loop of flash erasing/wear out.
I also attach a log file I collected demonstrating the problem, with some annotations
inline. The log are basically the existing printk's plus a few of my own converted to
log quickly to RAM to avoid cluttering the console and destabilizing real time.
I hope this helps us understand and support MLC flash a bit better. I am curious to
know if anyone else has seen problems like these and how they dealt with them.
Disclaimer - Any views or opinions presented in this e-mail are solely those of the author
and do not necessarily represent those of the company.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
More information about the linux-mtd