ubifs_decompress: cannot decompress ...
Matthew L. Creech
mlcreech at gmail.com
Wed Jun 8 13:50:24 EDT 2011
On Wed, Jun 8, 2011 at 10:11 AM, Artem Bityutskiy <dedekind1 at gmail.com> wrote:
>
> Yes, it does look like this LEB might be garbage-collected. But it does
> not have to be.
>
> Anyway, what I can suggest you is to do several things.
>
> 1. If you have many occasions of such error, try to gather some
> information about how the device was used, and if it was uncleanly
> power-cut. Remember, I often saw that embedded devices have incorrect
> reboot. Whe users reboot it "normally" - it does not try to unmount
> the FS-es cleanly and just jumps to som HW reset function.
>
> You can verify this by rebooting normally and checking if UBIFS says
> "recovery needed" or not. If it does - the reboot was not normal.
>
Yes, it currently reboots uncleanly (though it does do a "sync"
first). I noticed this a while back, and the next release firmware
will have it fixed. However, it doesn't make a huge difference to us,
because these devices are probably more likely to experience power
loss than a software reboot, in the field at least.
> 2. This error may be due to memory corruptions in some driver (e.g.,
> wireless or video), due to issues in the mtd driver, etc. Try to
> stress your system with slub/slab full checks enabled, and other
> debugging features which you can find in the "hacking" section of
> make menuconfig.
>
Will do.
> 3. If my theory is true, then what may help is adding a check it
> ubifs recovery function. The recovery ends with an ubifs_leb_change()
> call. You need to check the last node there - is it full and correct?
> If not, you should print a loud warning and information like leb dump
> _before_ the change, and dump of the buffer which we are going to
> write with ubifs_leb_change().
>
> You'd probably need to deploy this check to the field if this issue
> is not easy to reproduce. If you have then this info you may fix the
> bug.
>
Great, I'll add this check and see if we get any hits. Even if it
takes a while to hit it in the field, this would at least give us a
way to make some progress in finding the issue.
> 4. Set-up power-cut emulation testing in your office.
>
I did this at one point - I have a programmable UPS, so I was able to
automate a test to turn outlet power off & on repeatedly while having
the device do some work. It didn't seem to help reproduce the
problem, but it's worth trying again on a long-term basis (especially
with the change above to try & catch the corruption in the act).
Thanks again Artem.
--
Matthew L. Creech
More information about the linux-mtd
mailing list