Questions about NAND (double)bit errors
manningc2 at actrix.gen.nz
Wed Feb 8 17:26:10 EST 2006
On Friday 03 February 2006 00:12, Wolfgang Mües wrote:
> I want to use JFFS2/MTD in an embedded Linux device with frequent
> writes (worst case is 15 KBytes per 10 seconds, typical case is less than
> 10% of the worst case). The device will be a 512 MBit NAND SLC type from
> Hynix, Samsung or STM. We have a working prototype, and we have read many
> NAND flash papers available on the net, and the recent MTD mailing list
> Beside of wear leveling questions, there are program disturb errors
> (programming a page flips a bit in another page) and read disturb errors
> (reading a page flips a bit). Rates for these single-bit-errors are
> available in publications from M-systems and Toshiba.
> But since single bit errors are easily corrected by ECC, I am more
> interested in errors where more than 1 bit is flipped in a 256 byte ECC
> area. We cannot calculate these error numbers from the single bit errors
> because we don't know if these errors are unrelated to each other.
If you have not already done so, read the Toshiba NAND flash application
that might give some further info.
> Is there any information available to estimate/calculate the remaining
> errors after ECC correction? Or is there any information about first hand
> experience of NAND stress tests or other real world experience?
> Maybe the NAND project is terminated if I don't find anything about
> practical reliability...
I have not used JFFS2, but I have done extensive testing with YAFFS. At the
NAND level they should be about the same.
I have done a few accelerated lifetime tests that have gone very well. In one
test (run once on 512byte page devices and once on 2k page devices) I wrote,
read back and verified over 120Gbytes of data to the fs without a single bit
betting lost. Other people did similar tests too. This was on non-Linux
devices, but that's not material at the NAND level.
From my observations NAND is very reliable and is getting more reliable all
There are at least two factor that might be different for JFFS2 vs YAFFS:
* Most flash reliability is specified based on an assumption that you perform
a maximum number of writes per page. I don't know what JFFS2 does, but YAFFS
does one major write and then writes a single byte deletion marker to the OOB
area when the page is discarded. YAFFS2 does not write deletion markers. This
is generally well within the write limits used for the specification, so the
fash should be less stressed than was used to derive the specs. JFFS2 might
be different here.
* YAFFS is very conservative on dealing with ECC failures. YAFFS retires a
block if one ECC failure is seen. JFFS2, IIRC allows five of so failure
before retiring a block. The Toshiba folk have told me that if a block is
going bad, it is most likely to start displaying recoverable 1-bit errors
before displaying non-recoverable multi-bit errors. Thus, YAFFS will
potentially perform differently in this area.
Still, I think those rliability differences, at the flash level, are more than
likely theoretical noise and are unlikely to be material in the real world.
One important factor, IMHO, is how you handle the write protect pin on the
NAND. Some people tie the WP to the power supply failure flag. IMHO this is a
bad thing to do since it can cause incomplete writes to happen if the wp is
asserted during a write or erase cycle.
More information about the linux-mtd