UBIFS corruption in empty space during mount
Barak Adam
BAdam at adva.com
Thu Oct 29 00:48:39 EDT 2020
Hi all,
We are facing a kernel panic in our legacy switches, similar to one in the following post:
https://patchwork.ozlabs.org/project/linux-mtd/patch/loom.20120319T102527-948@post.gmane.org/
This corruption happens upon root FS mount and thus triggers a kernel panic upon system init.
System description:
=================
Our system is legacy, using Marvell Cetus SOC with a raw 1Gbit NAND of Micron, NAND ECC is 8 bit.
We are using UBIFS, Linux-3.10.70.
NAND driver is "armada-nand" by Marvell (mtd/nand/mvebu_nfc/nand_nfc.c), based on the PXA drivers/mtd/nand/pxa3xx_nand.c.
Using a script of endless loop of power cycling, we get this panic:
========================================================
UBIFS error (pid 1): ubifs_scan: corrupt empty space at LEB 3:7571
UBIFS error (pid 1): ubifs_scanned_corruption: corruption at LEB 3:7571
UBIFS error (pid 1): ubifs_scanned_corruption: first 8192 bytes from LEB 3:7571
UBIFS error (pid 1): ubifs_scan: LEB 3 scanning failed
VFS: Cannot open root device "ubi0:root" or unknown-block(0,0): error -117
Please append a correct "root=" boot option; here are the available partitions:
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
============================================================
I did read some of the posts about corruption of empty space for UBIFS.
Most of them recommend applying a fix on the lower layers, mtd or nand drivers.
In the past we had similar issues, it was happening during recovery of master node and I applied the following commits:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=730a43fbc135e593cc3de3b1b895e49c05c8e2dc
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=40cbe6eee97b706f27bcc4c6aa1018bbe4f1e577
But now I think it is happening during mount, while UBIFS replaying the journal and it is a different scenario.
As far as I understand, this is the call stack now that leading to the panic:
[<c0015050>] (unwind_backtrace+0x0/0xf8) from [<c00115f4>] (show_stack+0x10/0x18)
[<c00115f4>] (show_stack+0x10/0x18) from [<c0196634>] (ubifs_scan+0x29c/0x378)
[<c0196634>] (ubifs_scan+0x29c/0x378) from [<c0196aa4>] (ubifs_replay_journal+0x104/0x1380)
[<c0196aa4>] (ubifs_replay_journal+0x104/0x1380) from [<c018caf8>] (ubifs_mount+0xe88/0x15c8)
[<c018caf8>] (ubifs_mount+0xe88/0x15c8) from [<c00a0830>] (mount_fs+0x14/0xc8)
[<c00a0830>] (mount_fs+0x14/0xc8) from [<c00b7620>] (vfs_kern_mount+0x4c/0xc4)
[<c00b7620>] (vfs_kern_mount+0x4c/0xc4) from [<c00b992c>] (do_mount+0x1ac/0x8e8)
[<c00b992c>] (do_mount+0x1ac/0x8e8) from [<c00ba0ec>] (SyS_mount+0x84/0xbc)
[<c00ba0ec>] (SyS_mount+0x84/0xbc) from [<c0674ee0>] (mount_block_root+0x104/0x22c)
[<c0674ee0>] (mount_block_root+0x104/0x22c) from [<c06751a4>] (prepare_namespace+0x90/0x194)
[<c06751a4>] (prepare_namespace+0x90/0x194) from [<c0674bf0>] (kernel_init_freeable+0x180/0x1c8)
[<c0674bf0>] (kernel_init_freeable+0x180/0x1c8) from [<c04de5e8>] (kernel_init+0x8/0x154)
[<c04de5e8>] (kernel_init+0x8/0x154) from [<c000dfd8>] (ret_from_fork+0x14/0x3c)
ubifs_scan (fs/ubifs) is called to scan the lebs.
It detects the corrupted empty space, dump the corruption messages as shown above, and return the -EUCLEAN error code that makes the kernel panic.
ubifs_scan:
--> calls ubifs_start_scan (fs/ubifs)
--> which calls ubifs_leb_read (fs/ubifs)
--> which calls ubi_read (mtd/ubi.h)
--> which calls ubi_leb_read (mtd/ubi)
ubi_leb_read calls lower layer nand driver functions but finally returns with -EBADMSG error code indicating that the MTD driver has detected a data integrity problem (unrecoverable ECC checksum mismatch in case of NAND).
I am still debugging, looking for any solution / workaround.
Thanks !
Barak
Please see our privacy statement at https://www.adva.com/en/about-us/legal/privacy-statement for details of how ADVA processes personal information.
More information about the linux-mtd
mailing list