Is it an atomic operation for writing a page in NAND flash
Ricard Wanderlof
ricard.wanderlof at axis.com
Wed Jan 20 11:09:26 EST 2010
On Wed, 20 Jan 2010, Liu Hui wrote:
> Thanks you very much, CRC is the real solution.
>
> But I don't understand, if a partial write happens, we use ECC to
> correct the data, we will find the data can't be corrected, then
> -EBADMSG will be returned(see nand_correct_data()), then we can know
> this page are corrupted. IMHO, this works.
Assuming the ECC algorithm used by mtd, it only produces correct results
in the case of 0, 1 or 2 bit errors in the data. For more bit errors than
that, the result is undefined.
Let's assume a partial write occurs, which leads to 57 bit errors compared
to what was originally supposed to be there. Since there are more than 2
bit errors, the algorthm output is undefined; it may say that the data
can't be corrected, or it may say that the data is ok, or it may say that
the data can be corrected; it's impossible to tell. As far as I
understand, it is not uncommon for ECC to say the data is correct when it
fact it isn't.
A slightly trivial case:
Again assuming the ECC algorithm used by mtd, the ECC bytes for a chunk of
data where all the bytes have the same value is 0xFFFFFF, regardless of
the actual value. So, say you have a page full of 0xA3; the ECC is then
0xFFFFFFF. Now, assuming a partial write causes bit 2 of all bit cells to
not change from 1 to 0 when programming. The result is a page full of
0xA7, in effect, 256 bit errors (assuming a page size of 256 bits, or at
least, assuming an ECC calculation encompassing that many bytes). But the
ECC will still be 0xFFFFFF, and the corresponding ECC calculation will say
that the data is correct. That is, as I mentioned before, because the
result of an ECC calculation on data with >2 bit errors is undefined.
Note that there are other ECC algorithms which can correct more error
bits. For MLC flash it is recommended to use an algorithm which corrects 4
bit errors rather than a single bit error in a block of data. Such
algorithms require more ECC bits though.
One has a tendency to think of ECC as a checking algorithm. It is not. It
corrects and detects bit errors under certain circumstances. Outside those
circumstances it is worthless. For the case of the software algorithm used
in mtd, it is worthless if there are more than 2 bit errors. A failed
write could cause any number of bit errors, so it is worthless to check
the result using the ECC algorithm. The normal failure mode of a nand
flash chip is random single random bit errors with low probability. ECC
handles this elegantly.
Elaborating on this slightly, a devil's advocate would considere ECC
worthless as a correction algorithm for a flash chip. Assume one bit error
occurs, which is corrected by ECC. Then another bit error occurs in the
same page. ECC then detects a failure. Then another bit error occurrs. The
ECC algorithm is now worthless, it may detect the error, it may say that
the data is correct, or it may even try to correct it (erroneously).
The reason that all this works in practice is that the probability of a
bit error occurring is so low that the probability of two bit errors
occurring in the same page is very low, in some respect lower than other
failure modes in the system, so that we don't have to worry about it. (For
example: What about bit errors occurring in RAM chips from cosmic
radiation? It is a real risk, but so small that most systems don't have to
worry about it.)
It can be a real concern though, and that is why things like UBI provide
so-called bit scrubbing: whenever it detects that ECC has done a bit
correction in a block, it erases that block and rewrites the data [lots of
details omitted here] so that the chance of two bit errors ever occurring
in the same page will be very small indeed.
Especially in these days of larger and larger flash chips resulting from
shrinking chip geometries this is problem that is getting worse and worse.
It also tends to vary hugely among manufacturers.
/Ricard
--
Ricard Wolf Wanderlöf ricardw(at)axis.com
Axis Communications AB, Lund, Sweden www.axis.com
Phone +46 46 272 2016 Fax +46 46 13 61 30
More information about the linux-mtd
mailing list