UBIFS recovery fails

Jean-Sébastien Gagnon jsgagnon at vizimax.com
Tue Oct 18 11:10:32 EDT 2011


The situation you described should be already handled correctly by UBIFS, if the nand driver is correctly reporting pages with bitflips with the -EUNCLEAN.  In this case, UBI will move the PEB to a new one as soon as possible to avoid this problem.

My comment was really about the original error posted by Daniel Kuhn :

>>UBIFS: recovery needed
>>UBIFS error (pid 611): ubifs_recover_leb: corrupt empty space LEB 3550:188416, corruption starts at >>64362 UBIFS error (pid 611): ubifs_scanned_corruption: corruption at LEB
>>3550:252778
>>UBIFS error (pid 611): ubifs_scanned_corruption: first 1174 bytes from LEB 3550:
>>252778
>>00000000: ffffffbf ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff fffffff f  >>................................
>>00000020: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff fffffff f  >>................................
...

In this case, it seems that a bitflip has occur on a blank page, or maybe the page was partially programmed (really partially).



--------------------------------------------------------------------------------------

On Tue, Oct 18, 2011 at 01:47:26PM +0100, Jean-Sébastien Gagnon wrote:
> Hi, 
> Actually, I think the empty space corruption is the only thing to address in this 
> Specific problem, since any other error cause by unstable bits on valid data should be 
> corrected by the parities in the flash driver.

Hi Jean-Sébastien,

If you cut power during a page programming operation, you can easily get more
unstable bits than what the manufacturer-specified ecc supports (for instance,
3 unstable bits on a 1bit-ecc device). We experienced this on several different
devices.
Having a lot of bitflips (more than what ecc supports) is not the problem here:
the page was indeed partially programmed, it contains garbage and its contents
should be discarded.

The real problem appears when those faulty bits are unstable: during the first
few read attempts, the page may be successfully read (possibly with ecc
corrections); and then, a bit later, the page becomes unreadable because of too
many faulty bits.

Therefore, software using MTD (UBI, UBIFS) cannot just rely on being able to
read a page at some point to decide that this page reliably stores data.
It should also be able to trace power failures, and treat the NAND area being
modified (programmed or erased) during the power cut as potentially unstable.

HTH,
-- 
Ivan


More information about the linux-mtd mailing list