UBIFS handling of erased page errors

Thu Mar 26 22:12:17 PDT 2015

Hi Artem,

sorry about the delay, I was doing some more experiments
with stressing "in the gaps". :-)

On Tue, 17 Feb 2015 18:59:16 +1100
Artem Bityutskiy <dedekind1 at gmail.com> wrote:

> On Tue, 2015-02-17 at 17:39 +1100, Iwo Mergler wrote:
> > I couldn't see a likely debugfs entry for in-the-gaps allocation,
> > which one is it?
> 
> OK, there is no separate knob for this, but it could be added if
> needed. Right now the feature is enabled with the 'chk_index' knob.
> This knob is about extra index checks, which slow down the FS. But it
> also forces 'in-the-gaps' from time to time.

I find that enabling chk_index triggers false positives
under load - in essence, some obsolete index nodes appear
to go missing and the check fails.

> > I'll go and tell the customer to delete files occasionally.
> 
> Not sure about how deleting is helpful.

Only in the sense that an almost empty file system won't
exercise the the dark corners of UBIFS quite so much. ;-)

On the other hand, running a file system check with many
files is a good way to push that. And I found some
interesting things in the process.

There are two, possibly related issues I can see. One is
that the current block torture & recycling arrangement
in UBI appears to be a bad match to a real-life failure.

The other is that apparently a single borderline block
can have UBI fail during leb_change with "no free eraseblocks",
leading to R/O mounting.

The kernel messages are attached.

My setup is this. The NAND is a 256MB SLC device, 2K
pages, 100K cycles. A partition of 1200 PEBs (~150MB)
contains UBI and a single UBIFS volume.

I have picked a PEB at random (PEB729) and abused it to
the point of failure. Essentially, I used the same method
UBI does for torture, continuously. After 350K erase cycles,
the block erase operation failed.

Restarting the torture would do another 50 or so cycles
before erase failure. The longer the pause, the more
cycles to failure.

Notably, there wasn't a single detected bit error in the
process of this experiment. If, however, I let an erased or
written page rest for a little (a few minutes), the bit errors
show up.

So, what we have here is a block that develops bit errors
very quickly, but an immediate verify operation shows
none.

I then run the file system stress test, juggling some 7000
files with random erase, create, truncate and extend
operations, close to 100% fill level.

In this scenario, the broken PEB is accessed every 5 minutes
or so. This is enough to see some correctable bit errors.
UBI then scrubs and sometimes tortures the block.

Either way, the operation is successful every time, since
it takes the bits a few minutes to fall over, and a short
burst of erases doesn't give an error.

So there is continuous churn on this block, but after
hundreds of torture cycles the block is still not marked
as bad.

Which means that, under non-stresstest conditions, the
data in the block has time to develop many bit errors.
If the ECC was able to correct them, we go through another
fruitless torture cycle and re-use the block. If it can't
correct them, we probably lose the file system.

At the moment, UBI doesn't seem to keep track of suspect
PEBs, so any error is treated as if it's the first.

The (maybe) obvious solution is to keep some failure
statistics in the EC header. Maybe something along the
lines of "ec of last torture", "number of tortures",
"likelihood of being a witch", etc.

That would allow repeat offenders to be tested more
thoroughly (e.g. write today, verify tomorrow), and
retire them earlier.

What do you think?

About the second issue, the leb_change failure, can this
be related to the broken block? To me it looks like another
scrub & move of PEB729 has occurred during the leb_change
operation.

Interestingly, after the watchdog reboot and re-mount
of the file system, UBI has 25 PEBs (including PEB729)
hanging around unallocated.

I assume 18 of those are reserved for bad block handling,
7 are actually free, but may not have been when the failure
occurred.

But even so, I think UBI should be allowed to dip into
the reserved pool to complete a leb_move. Another block
will become free in the process.

Best regards,

Iwo
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: broken_block_log.txt
URL: <http://lists.infradead.org/pipermail/linux-mtd/attachments/20150327/daa14ce0/attachment-0001.txt>