UBI wear leveling / torture testing algorithms having trouble with MLC flash

Mon Apr 12 21:00:19 EDT 2010

Setup:
I am using a Samsung 2GB MLC flash with 4K page size and 8-bit
hardware ECC support. I am using a 2.6.31 kernel, but if I read the 
code right, the problem I see is in the latest 2.6.34-rc2 code as well. 

Summary:
I traced a torture test caught in a repeating loop, wearing out a 
flash block. It appears that the torture test with it's simple 
0xA5/0x5A/0x00 pattern is passing, whereas actual application data 
is consistently showing a bit flip on the same block. MLC flashes I 
have used regularly show corrected errors over several blocks in time, 
and also show corrected errors consistently due to program-disturb 
effects, but these program disturb effects appear to be data dependent, 
based on device geometry (e.g. adjacent multi-bit cell effects). The 
scenario above can cause an repeating scrubbing / torture test loop 
to occur. It is initiated by the nand driver reporting a bit correction 
in a block which starts the scrubbing/torturing loop. 

Details:

(Please correct me if I have got ubi parts of this explanation wrong - Thanks).

A single bit correction is a relatively normal event with MLC flash and 
this triggers a scrub operation in UBI in which the block is considered 
marginal, and the data is moved from a marginal block to a good block. A 
good block is one that is considered to have passed the UBI torture test, 
which simply writes 0xA5, 0x5A, and 0x00 to a block with 3 erases, and 
then does a 4th erase for the user data that will eventually be copied to 
this block. But this torture test is too simplistic since it doesn't 
understand the geometry of MLC flashes and in particular, write-disturb 
and read-disturb effects from neighboring cells and columns. What's maybe 
needed is a torture test that understands the geometry and the bit correction 
level 1,4,8,12 etc and is able to toggle geometrically neighboring bits rather 
than view it as a simple memory test, but this may be difficult. Perhaps 
torturing MLC blocks with only 3000 cycles is inappropriate anyways, why not 
just trust the error correction and only mark uncorrectable blocks as bad?. 
Since the torture test passes with no bit errors I believe that it didn't 
find certain error correction patterns that appear with actual application 
data. As a result, when the torture test passes, the block shows no correctable 
errors, and is returned to the free pool. The original block X that had a 
correctable error is scrubbed and it's data is copied to the free block with 
the highest erase count. The highest erase count block is chosen and will remain 
partially written and available for new scrubs, and data will be scrubbed to it 
from other marginal blocks while the other blocks catch up in erase count. However, 
if block Y previously had a bit error and was torture tested and passed, it's 
erase count will go up by 4. As a result, block Y has a higher probability of 
being the target of the copy from X to Y. But when the copy of actual application 
data is done, block Y now shows a data-dependent correctable bit error again and 
the scrub operation fails, and block Y is again sent out for torture testing, and 
the process repeats itself until block Y either shows no error or the block wears 
out, is marked bad, and another free block is chosen. So not only do blocks wear 
out, but many blocks can be taken out of service as well by being marked bad. If 
the available pool of free blocks for bad block management (typically 1% of the 
total blocks in a partition) go bad, then the file system is remounted read-only 
and the product becomes unusable for writable data in that partition. If I have a 
3000 erase MLC part, then only 750 quick loops of the scrub/torture (4 erase) cycle 
will wear the block out. At that point the next free block will be used which may 
or may not show a corrected error.

Possible Solutions:
I think that with MLC flashes, we perhaps shouldn't be so aggressive to scrub/torture 
given the more frequent error rate than we see with SLC parts. Also hiding the actual 
corrections in the nand driver to hide the problem in UBI isn't a good long term 
solution easier. The UBI algorithm doesn't appear to understand different levels 
of correction possible. For example, on a 12-bit part, an single corrected error 
is much more common and normal than say, 11 bit corrections on a marginal block, 
but UBI treats 1 or 11 bits the same algorithmically. The ecc correction capability 
maybe should be known by UBI so it can make better decisions about when to run 
scrub/torture algorithms? Or perhaps we don't do torturing at all with MLC flash and 
rely on the multi-bit correction instead to help us. The correction level and the 
flash type could be useful information for UBI to use in deciding algorithms like these.

The short term solution might be for the nand driver to hide error corrections from 
the UBI wear leveling software by reporting all good corrections as 0 errors. Then 
UBI shouldn't start scrubbing and torturing on a single bit error in a block. Two 
blocks with a single error that are part of the scrubbing/torturing duo start the 
loop of flash erasing/wear out. 

I also attach a log file I collected demonstrating the problem, with some annotations 
inline. The log are basically the existing printk's plus a few of my own converted to 
log quickly to RAM to avoid cluttering the console and destabilizing real time. 

I hope this helps us understand and support MLC flash a bit better. I am curious to 
know if anyone else has seen problems like these and how they dealt with them.

Thanks.
Darwin

Disclaimer - Any views or opinions presented in this e-mail are solely those of the author 
and do not necessarily represent those of the company. 

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mtd_torture_reasons.txt
URL: <http://lists.infradead.org/pipermail/linux-mtd/attachments/20100412/f4505b45/attachment-0001.txt>