[RFC PATCH] UBIFS: remove check for all(0xff) for empty pages

Tue Mar 11 07:41:00 EDT 2014

Hi Artem,

Thanks for this detailed explanation. I have few more queries, after
going through the presentation [1]

>From: Artem Bityutskiy [mailto:dedekind1 at gmail.com]
>>On Tue, 2014-03-11 at 11:53 +0530, Pekon Gupta wrote:
>> UBIFS throws following error if an blank page does not contain perfect all(0xff)
>> data, during LEB scan, even though if number of bit-flips is within ecc.strength
>> 	"UBIFS error (pid 1): ubifs_scan: corrupt empty space at LEB 417:53168"
>>
>> But newer technology devices (like MLC NAND) are prone to read-disturb bit-flips
>> and so having perfect all(0xff) data on empty-pages is practically not feasible
>> on these newer technology devices (like MLC NAND).
>> Also, such device spec suggest using stronger ECC schemes (like BCH16), and have
>> large OOB size to guarantee longer field life.
>>
>> So this check of perfect all(0xff) data can be safely removed from UBIFS, as
>> UBI and MTD layers already these before passing the data to UBIFS:
>>  if (number-of-bitflips < mtd->bitflip_threshold)
>>         Driver's ECC scheme is strong enough to handle existing bit-flips in
>>         future, so both MTD and UBI ignore such bit-flips.
>>  if (number-of-bitflips >= mtd->bitflip_threshold)
>>         MTD flags -EUCLEAN.
>>         UBI schedules the block for scrubbing.
>>  if (number-of-bitflips > nand_chip->ecc.strength)
>>         MTD flags -EBADMSG.
>>         UBI tries to extract data relevant data from block based on sanity of
>>         its headers, and then schedule it for re-erase.
>>
>> Signed-off-by: Pekon Gupta <pekon at ti.com>
>> ---
[...]
>
>Let me try to explain why this code is present in UBIFS. When we
>designed and implemented UBIFS we assumed that empty space cannot have
>bit-flips. And NANDs that we had never exposed this kind of issues.
>
>The other assumption that we were kept in mind is that we cannot trust
>the media. The flash may contain any garbage, the contents may be
>inconsistent, there may be any random corruptions anywhere.
>
>This is why we have all these CRC32 checksums everywhere, and all this
>"validate this and that, check the sanity" functions.
>
>We explicitly did not want to try handling random corruptions. Simply
>because this is dangerous. E.g., you have one node corrupted, and it
>contains important data, and UBIFS just kills it - you have no chance to
>recover your important data any more.
>
>We thought that random corruptions handling belongs to user-space.
>User-space tools can ask questions, have many options, preserve data in
>a separate file, etc.
>
>However, there is one type of corruptions we _did_ want to gracefully
>handle - corruptions caused by power cuts.
>
>So in the kernel space we tried to be very very careful in
>distinguishing between power cut corruptions and random corruptions.
>
>Now, power cut corruptions have the following properties:
>
>1. Can only happen in journal LEBs.
>2. Happen only in one single write unit (max_write_size), because UBIFS
>always writes one write unit (or less) at a time.
>3. The next write unit (the one coming after the corrupted write unit)
>should be empty, never written to. Just because we write unit after unit
>sequentially.
>
>And UBIFS simply checks if these conditions hold. If they are, UBIFS
>assumes this is a power cut corruption, and simply wipes all the
>corrupted nodes out. The corrupted nodes will all sit in the corrupted
>write unit, and/or in nodes that _end_ in the corrupted write unit.
>
Is there a way to distinguish PEB(s) serving as part of moving 'Journal'
during device scan ?
- If yes, then is the above all(0xff) check limited to only to those LEB which
  are part of 'un-committed' Journal.
- If no, then this all(0xff) check will apply to all LEB.

But, I don't think UBIFS can distinguish between LEBs as part of uncommitted
Journal,  as these LEBs become part of index tree after commit.  Correct ?
In such case, this all(0xff) check will apply to all LEB during mount. Correct ?

>And what UBIFS recovery procedure does - it simply moves all the good
>nodes from the original PEB to a different PEB, re-maps the LEB to the
>new PEB, and erases the older PEB. And it clears up all the corrupted
>garbage this way, so we end up with a partially-filled LEB. And we even
>keep adding more data there afterwards.
>
How is this different from scrubbing, which is done by UBI layer ?

>Now what you do, you just remove the check for condition #3. Not good, I
>think...
>
>Removing these check means that if there is corruption in the middle of
>a journal LEB, and this corruption is _not_ caused by a power cut, UBIFS
>will kill all the nodes in the corrupted write unit and all the good
>nodes after the corrupted write unit (I did not verify if this is true,
>relying on my memory here).
>
I'm not against recovery, but the way how a corrupted node is detected.
Instead of comparing empty-space by all(0xff) itself, UBIFS should
depend on return-codes from UBI like:
- UBI_IO_FF: clean empty page
- UBI_IO_FF_BITFLIP: empty page with correctable bit-flips
- UBI_IO_BITFLIP: programmed page with correctable bit-flips
- UBI_IO_BADMSG: programmed page with un-correctable bit-flips

>Instead, what is needed is a way to distinguish between "empty" space
>and "written-to space".
>
This is a bigger change, because NAND driver works on units of pages,
while UBIFS works on units of LEBs. A return code of -EBADMSG by NAND
driver cannot be converted into a similar -EBADMSG for the whole LEB.
Thus we need to separate out 'bit-flips in empty region' v/s
'bit-flips in programmed-region'.
This points to my earlier proposal on same [2], can you please review
in current context ?

>For some HW the driver could tell by looking at the OOB.
>
>Or we could simply verify against 0xFFs but allow for a number of 0
>bits, which depends on the HW ECC strength.
>
>Or may be try the former, and if it is not supported by the driver,
>fall-back to the latter.
>
>But I do not think that just removing the checks is a good idea. They
>really help to keep us sane. And they helped already to expose the
>shortcomings of the design.
>

[1] http://www.linux-mtd.infradead.org/doc/ubifs.pdf
[2] http://lists.infradead.org/pipermail/linux-mtd/2014-January/051560.html

with regards, pekon