SquashFS mixed errors (decompression failed and others)
Vincent Wiemann
vincent.wiemann at ironai.com
Fri May 21 09:07:58 PDT 2021
On 5/21/21 3:58 PM, Koen Vandeputte wrote:
>
> On 21.05.21 13:19, Ibrahim Tachijian wrote:
>> Hello,
>>
>> We use approximately 10k IPQ40XX devices and we have noticed that
>> every time we run "sysupgrade -n" we lose approximately 1% of the
>> routers in the process.
>> After further investigation I'm almost confident that it is not the
>> sysupgrade process that is the culprit - so what I did was that I put
>> one test router into a reboot loop.
>>
>> This is what I do;
>>
>> Boot the router in a fresh state after a newly installed image.
>> The image contains a reboot loop that consists of a shell script that
>> runs every minute.
>>
>> The shell script tries to run a php-script which simply echoes "Hello
>> World". If the php-script exists normally then we reboot the router.
>>
>> However the php-script exists abnormally then the router stops and
>> does nothing other than informing me that there was a bus-error making
>> php not able to process the hello world script.
>>
>> When this process runs the router reboots approximately 50 times
>> before it boots into a state which is faulty where I see bus-errors
>> when I try to run php scripts for example.
>>
>>
>> Looking into dmesg you can see some errors such as,
>>
>> [10985.209438] SQUASHFS error: squashfs_read_data failed to read block
>> 0x3a803e
>> [11045.218685] SQUASHFS error: xz decompression failed, data probably
>> corrupt
>> [11045.218731] SQUASHFS error: squashfs_read_data failed to read block
>> 0x3a803e
>> [11105.228157] SQUASHFS error: xz decompression failed, data probably
>> corrupt
>> [11105.228203] SQUASHFS error: squashfs_read_data failed to read block
>> 0x3a803e
>>
>> or
>>
>> [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size
>> 10234
>> [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a]
>> [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size
>> 10234
>> [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a]
>> [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size
>> 10234
>> [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a]
>> [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size
>> 10234
>>
>> or
>>
>> [62745.801178] SQUASHFS error: squashfs_read_data failed to read block
>> 0x732ae2
>> [62773.347234] SQUASHFS error: xz decompression failed, data probably
>> corrupt
>> [62773.347281] SQUASHFS error: squashfs_read_data failed to read block
>> 0x732ae2
>> [62790.132661] SQUASHFS error: xz decompression failed, data probably
>> corrupt
>> [62790.132706] SQUASHFS error: squashfs_read_data failed to read block
>> 0x732ae2
>> [62790.216746] SQUASHFS error: xz decompression failed, data probably
>> corrupt
>> [62790.216792] SQUASHFS error: squashfs_read_data failed to read block
>> 0x732ae2
>> [62800.810525] SQUASHFS error: xz decompression failed, data probably
>> corrupt
>> [62800.810570] SQUASHFS error: squashfs_read_data failed to read block
>> 0x732ae2
>> [62828.336267] SQUASHFS error: xz decompression failed, data probably
>> corrupt
>>
>>
>>
>> Now, you would assume that the squashfs-partition is broken - but if
>> this was the case then a reboot should not help. It does.
>> Rebooting the router after it boots in this faulty state fixes the issue.
>>
>> So approximately 1-2% of my reboots make the router go into this
>> faulty state.
>>
>> I am clueless on how to further investigate this issue. For now my
>> work around is restarting the router via a bash script should it
>> notice there are bus-errors or i/o errors.
>>
>> Thanks
>>
> In the next kernel bump, following patch is also present:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.10.38&id=2ed1d90162a0c0683ecbe0c4802187fa22d641c3
>
>
> I think it's worth a shot to retry the tests once it's bumped.
>
> Koen
>
My guess is that the error already happens when reading the flash.
Is your firmware (sysupgrade) bigger than 16MB?
So maybe it has to do with switching to 4-address-mode...
Best,
Vincent
More information about the openwrt-devel
mailing list