ubifs_scan() error handling
twebb
taliaferro62 at gmail.com
Mon Jan 31 20:26:59 EST 2011
>> It's been awhile but I wanted to reopen the discussion on this topic.
>> Could you take a look at this proposed patch? Essentially this change
>> results in the LEB being cleaned/recovered regardless of whether
>> is_last_write() is true or not. There may be a better way to do this
>> earlier in the same function, but I'm not familiar enough with it to
>> know the significance of the is_last_write() call.
>
> But I think I explained why this check is there. Why exactly it does not
> work for your flash? I think you need to get better understanding what
> is happening in your case. I am reluctant to take this patch because it
> is more of a band-aid but not a proper solution.
>
I don't recall an explanation about why that check is there, but
regardless, I'll try to explain why things are failing on my flash:
Let's take the case of a read or write disturb error causing multiple
bit flips in the empty space of a LEB (not in the common header node
magic number location). I believe this type of error is very common
with MLC (since ECC will generally handle bit flips in non-FF areas.)
LEB: | good nodes | 0xFFs | bit flip | 0xFFs | bit flip | 0xFFs|
When ubifs_recover_leb() calls ubifs_scan_a_node(), it correctly
returns SCANNED_EMPTY_SPACE. Then ubifs_recover_leb() finds that the
LEB buf is not empty. It also finds that !is_last_write() is TRUE and
breaks without setting empty_chkd. Later, as a result of !empty_chkd
and !is_empty and !is_last_write all being TRUE, the LEB is marked as
corrupted. This ultimately may result in a failure to mount or in a
RO mount.
However, because of the nature of the "corruption", if
ubifs_recover_leb() ignores is_last_write() result and instead calls
clean_buf() and sets need_clean = 1, then fix_unclean_leb() ultimately
fixes the bit flip via the ubi_leb_change() call and without data
loss.
Does this make sense or is my logic wrong? I think it's OK assuming
that no true corruption (associated with a power loss) happens in
conjunction with the rd/wr disturb.
I do understand that perhaps this is more a band-aid than a proper
solution. However, I'm trying to understand whether it is reasonable
and whether you think it does more good than harm.
twebb
More information about the linux-mtd
mailing list