ubifs_decompress: cannot decompress ...

Matthew L. Creech mlcreech at gmail.com
Wed Jun 8 13:50:24 EDT 2011


On Wed, Jun 8, 2011 at 10:11 AM, Artem Bityutskiy <dedekind1 at gmail.com> wrote:
>
> Yes, it does look like this LEB might be garbage-collected. But it does
> not have to be.
>
> Anyway, what I can suggest you is to do several things.
>
> 1. If you have many occasions of such error, try to gather some
>   information about how the device was used, and if it was uncleanly
>   power-cut. Remember, I often saw that embedded devices have incorrect
>   reboot. Whe users reboot it "normally" - it does not try to unmount
>   the FS-es cleanly and just jumps to som HW reset function.
>
>   You can verify this by rebooting normally and checking if UBIFS says
>   "recovery needed" or not. If it does - the reboot was not normal.
>

Yes, it currently reboots uncleanly (though it does do a "sync"
first).  I noticed this a while back, and the next release firmware
will have it fixed.  However, it doesn't make a huge difference to us,
because these devices are probably more likely to experience power
loss than a software reboot, in the field at least.

> 2. This error may be due to memory corruptions in some driver (e.g.,
>   wireless or video), due to issues in the mtd driver, etc. Try to
>   stress your system with slub/slab full checks enabled, and other
>   debugging features which you can find in the "hacking" section of
>   make menuconfig.
>

Will do.

> 3. If my theory is true, then what may help is adding a check it
>   ubifs recovery function. The recovery ends with an ubifs_leb_change()
>   call. You need to check the last node there - is it full and correct?
>   If not, you should print a loud warning and information like leb dump
>   _before_ the change, and dump of the buffer which we are going to
>   write with ubifs_leb_change().
>
>   You'd probably need to deploy this check to the field if this issue
>   is not easy to reproduce. If you have then this info you may fix the
>   bug.
>

Great, I'll add this check and see if we get any hits.  Even if it
takes a while to hit it in the field, this would at least give us a
way to make some progress in finding the issue.

> 4. Set-up power-cut emulation testing in your office.
>

I did this at one point - I have a programmable UPS, so I was able to
automate a test to turn outlet power off & on repeatedly while having
the device do some work.  It didn't seem to help reproduce the
problem, but it's worth trying again on a long-term basis (especially
with the change above to try & catch the corruption in the act).

Thanks again Artem.

-- 
Matthew L. Creech



More information about the linux-mtd mailing list