UBIFS recovery fails

Artem Bityutskiy dedekind1 at gmail.com
Wed Oct 19 11:15:15 EDT 2011


On Tue, 2011-10-18 at 10:29 +0200, Ivan Djelic wrote:
> That's interesting... Do you have more details or any data on those eMMC
> power-cut failures ?

Not much. We were testing eMMC and were trying to make sure that if we
sync the data, and then have a power cut, we never lose the data which
was synced. We have a test which worked directly with the block device,
so no file-system involved. And in some cases eMMC had sectors which
were reported to be already written corrupted. The vendor later said
that yes, there is a FW bug, and promised to fix it in the next
revision.

eMMC FW is written by humans as well :-)

> I plan to be working soon (December) on UBIFS robustness issues with unstable
> modern SLCs; besides using nandsim to simulate SLC (and maybe MLC) issues,
> I also have real hardware with a power-cutting framework ready for testing.

I suggest you to improve the UBIFS power cut emulation functions and
make them emulate unstable bits, and then use integck which is already
able to handle emulated power cuts. This will allow you to

1. Test quickly
2. Continue the half-done work
3. Work with nicer code-base than ugly nandsim
4. Make it possible to emulate unstable bits in interesting places like
   TNC, LPT, orphans area, etc. Otherwise most of the failures will be
   emulated in data area.


Similarly, something like that should be done in UBI level which will
emulate power cuts _only_ when writing UBI-specific stuff (e.g., the
headers).

Something on driver level can also be done later.

I know you are driver guy and it is more natural for you to start from
driver, but I suggest starting from UBIFS and fix 90% of the issues
there, then go down. This way you will also isolate non-UBIFS specific
issues.

Anyway, we should start with _documenting_:
1. What are unstable bits
2. Which work UBIFS/UBI/MTD needs to handle that.
3. What are MLC-specific issues
4. What would have to be done to handle them.

I have ideas about the paired pages in MLC.

But the thing also is that the whole stack is complex and big and
has a lot of states (like any FS), so it is easy to miss something and
you never know the complete list until you actually start stressing the
stack.

But let's document what we know at the moment. Then people who are
interested to have that fixed can start approaching that.

-- 
Best Regards,
Artem Bityutskiy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: This is a digitally signed message part
URL: <http://lists.infradead.org/pipermail/linux-mtd/attachments/20111019/7ecdf9a9/attachment.sig>


More information about the linux-mtd mailing list