ubifs_decompress: cannot decompress ...

Tue May 31 17:47:19 EDT 2011

On Tue, May 31, 2011 at 12:10 PM, Ben Gardiner
<bengardiner at nanometrics.ca> wrote:
>
> Interesting -- does the trailing 0xff have ECC set, or is it erased
> pages of 0xff?
>
...
>
> Could it be possible that writing the page was interrupted? I guess
> the CRC checks above decompress would catch that though.
>

I verified by adding a call to ubi->mtd->read_oob(): the 0xff data
starts on a page boundary (a whole multiple of 2k in my case).  The
associated OOB area for that page is all 0xff as well.

So I guess this is less about the original LZO error, and more about
how a page in the middle of a UBIFS node got erased out from
underneath it.  Which seems suspiciously similar to the
ubifs_read_node() error which I reported last year:

http://lists.infradead.org/pipermail/linux-mtd/2010-July/031069.html

and which is still showing up from time to time on devices in the
field.  In that case the erased page contained the node header and so
"type" was interpreted as 255; in this case the erased page is in the
middle of a data node, resulting in decompression failure instead.

Unfortunately it's not repeatable enough for us to capture adequate
debug output - we've had several devices logging extensive debug
output via netconsole for months now, but none have recreated this
problem so far under those conditions.

-- 
Matthew L. Creech