yaffs power fail reliability
sergei.sharonov at halliburton.com
Fri Jun 3 12:39:27 EDT 2005
beg pardon if this is not exactly the right mailing list to post but I did
post on yaffs mailing list twice and there was no response. So here we go
*** Did anybody perform power cycling tests on yaffs? ***
This is not an idle question. Yaffs supposed to be robust in respect to
power failures. I designed a test that produces filesystem corruption after
about 100 power cycles. The symptoms are flipped bits in files, some files
growing in size (appended with a bunch of zeros). After looking at yaffs code
it seems to me that a few corner cases are not handled:
1. Clean marker is not used. AFAIK that may lead to using partially erased
blocks with bits flipping back and forth (as in original jffs).
2. Garbage collector does not check ecc on data before copying it, e.g. if
the tags are ok but data is bad, the bad data will be copied and new good ecc
will be assigned to it. Hmm, on a second thought, I do not recall if ecc
is checked even on tags there.
3. Few places may benefit from checking return codes (and handling errors).
4. Something else that I missed ;-) Some corruptions I cannot account for.
Unfortunately there is _no_ alternative to yaffs at the moment when using
large NAND devices. JFFS3 would be nice but it is still in design stage.
So, as I see it there are two immediate possibilities:
1. I have bad hardware and/or driver and YAFFS may be used just fine as is.
The argument against that is I've sent 140 GBytes of data using jffs2 and no
power cycling through the NAND chip. Besides, TGLX looked at the driver and
did not see a problem with it ;-)
2. We, the people with big NAND FLASH devices ;-), test the heck out of
yaffs and fix what is broken. I've got a few ideas if there is an interest.
More information about the linux-mtd