Intermittent MTD attachment failures

Colin Foster colin.foster at in-advantage.com
Wed Nov 23 16:36:26 PST 2022


Hello MTD mailing list world,

I'm getting up and running with an MTD / UBI / UBIFS NAND device on a
Phytec PCM049 SOM running an OMAP 4460. A vendor BSP was offered in the
3.4 kernel era, and I'm trying to bring this into the new age.

I'm seeing intermittent failures to attach my MTD device. My hunch is
that I'm doing something horribly wrong... that has been the case
historically. But I'm not sure what.


I'm using U-Boot (MLO and u-boot.img in raw NAND) with RAUC, and a
redundant U-Boot environment partition pair, along side an A/B
partition in the UBI.

Every once in a while, the kernel panics because it can't find a root
filesystem. Interestingly, U-Boot doesn't seem to have an issue. I'm
curious why that might be.


Is there something that I'm doing that might cause this error to be
happening, and is there something I can do to avoid it? Boot logs below.




>From a bad boot cycle, U-Boot comes up as I'd expect:

U-Boot 2022.10 (Nov 22 2022 - 11:57:31 -0800)

CPU  : OMAP4460-GP ES1.1
Model: Phytec PCM-959 Eval Board
Board: OMAP4 PCM959
DRAM:  1 GiB
Core:  111 devices, 10 uclasses, devicetree: separate
NAND:  device found, Manufacturer ID: 0x2c, Chip ID: 0xb3
Micron MT29F8G16ADBDAH4
1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
1024 MiB
MMC:   OMAP SD/MMC: 0
Loading Environment from UBI... Scanning device for bad blocks
Bad eraseblock 1027 at 0x000008060000
ubi0: attaching mtd4
ubi0: scanning is finished
ubi0: attached mtd4 (name "main", size 508 MiB)
ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 129024 bytes
ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 512
ubi0: VID header offset: 512 (aligned 512), data offset: 2048
ubi0: good PEBs: 4070, bad PEBs: 1, corrupted PEBs: 0
ubi0: user volume: 4, internal volumes: 1, max. volumes count: 128
ubi0: max/mean erase counter: 3/1, WL threshold: 4096, image sequence number: 461569733
ubi0: available PEBs: 779, total reserved PEBs: 3291, PEBs reserved for bad PEB handling: 159
Read 262144 bytes from volume env1 to bde841c0
Read 262144 bytes from volume env2 to bdec4200
OK
Net:   No ethernet found.
Hit any key to stop autoboot:  0
Found valid slot B, 5 attempts remaining
UBI partition 'main' already selected
UBIFS (ubi0:3): UBIFS: mounted UBI device 0, volume 3, name "rootfs-b", R/O mode
UBIFS (ubi0:3): LEB size: 129024 bytes (126 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
UBIFS (ubi0:3): FS size: 199987200 bytes (190 MiB, 1550 LEBs), journal size 9033728 bytes (8 MiB, 71 LEBs)
UBIFS (ubi0:3): reserved for root: 0 bytes (0 KiB)
UBIFS (ubi0:3): media format: w4/r0 (latest is w4/r0), UUID 27C99F76-8955-49A0-81C5-115DDE0BC3D8, small LPT model
Saving Environment to UBI... UBI partition 'main' already selected
Writing to redundant UBI... done
OK
Loading kernel
Loading file 'boot/zImage' to addr 0x82000000...
Done
Loading file 'boot/omap4-phytec-pcm-959.dtb' to addr 0x88000000...
Done
Unmounting UBIFS volume rootfs-b!
Kernel image @ 0x82000000 [ 0x000000 - 0x6e0670 ]
## Flattened Device Tree blob at 88000000
   Booting using the fdt blob at 0x88000000
   Loading Device Tree to 8ffe6000, end 8ffffa63 ... OK

Starting kernel ...



The kernel comes up with this command line:

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 6.1.0-rc6 (colin at colin-ia-desktop) (arm-buildroot-linux-uclibcgnueabihf-gcc.br_real (Buildroot 2022.05.1) 10.3.0, GNU ld (GNU Binutils) 2.37) #1 SMP PREEMPT Tue Nov 22 14:14:49 PST 2022
[    0.000000] CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c5387d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[    0.000000] OF: fdt: Machine model: Phytec PCM-959 Eval Board
[    0.000000] Memory policy: Data cache writealloc
[    0.000000] Reserved memory: created CMA memory pool at 0x98000000, size 8 MiB
[    0.000000] OF: reserved mem: initialized node dsp-memory at 98000000, compatible id shared-dma-pool
[    0.000000] Reserved memory: created CMA memory pool at 0x98800000, size 112 MiB
[    0.000000] OF: reserved mem: initialized node ipu-memory at 98800000, compatible id shared-dma-pool
[    0.000000] cma: Reserved 16 MiB at 0xbf000000
[    0.000000] OMAP4: Map 0xafe00000 to (ptrval) for dram barrier
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000080000000-0x00000000afdfffff]
[    0.000000]   HighMem  [mem 0x00000000afe00000-0x00000000bfffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080000000-0x00000000afdfffff]
[    0.000000]   node   0: [mem 0x00000000b0000000-0x00000000bfffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000000bfffffff]
[    0.000000] On node 0, zone HighMem: 512 pages in unavailable ranges
[    0.000000] OMAP4460 ES1.1
[    0.000000] percpu: Embedded 16 pages/cpu s34708 r8192 d22636 u65536
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 259908
[    0.000000] Kernel command line: console=ttyS2,115200 mtdparts=nandflash:0x20000(xload_raw),0x180000(u-boot),0x180000(u-boot-2),0x1fce0000(main) ubi.mtd=3 root=ubi0:rootfs-b rootfstype=ubifs rw rauc.slot=B


And shortly thereafter, I see an error -110 (EMULTIHOP?) while checking
if a PEB is bad:


[    2.165771] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xb3
[    2.165771] nand: Micron MT29F8G16ADBDAH4
[    2.176239] nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
[    2.176239] nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme
[    2.189483] 4 fixed-partitions partitions found on MTD device omap2-nand.0
[    2.196624] Creating 4 MTD partitions on "omap2-nand.0":
[    2.196624] 0x000000000000-0x000000020000 : "xload_raw"
[    2.209533] 0x000000020000-0x0000001a0000 : "u-boot"
[    2.216369] 0x0000001a0000-0x000000320000 : "u-boot-2"
[    2.225128] 0x000000320000-0x000020000000 : "main"
[    2.454742] ubi0: attaching mtd3
[    3.195129] ubi0 error: ubi_io_is_bad: error -110 while checking if PEB 4037 is bad
[    3.205230] ubi0 error: ubi_attach_mtd_dev: failed to attach mtd3, error -110
[    3.205230] UBI error: cannot attach mtd3
[    3.223907] l4-secure-clkctrl:0038:0: failed to disable
[    3.229705] VFS: Cannot open root device "ubi0:rootfs-b" or unknown-block(0,0): error -19
[    3.238159] Please append a correct "root=" boot option; here are the available partitions:
[    3.238159] 0100           16384 ram0
[    3.246582]  (driver?)
[    3.246582] 0101           16384 ram1
[    3.246582]  (driver?)
[    3.258972] 0102           16384 ram2
[    3.259002]  (driver?)
[    3.265197] 0103           16384 ram3
[    3.265197]  (driver?)
[    3.265197] 0104           16384 ram4
[    3.271362]  (driver?)
[    3.277496] 0105           16384 ram5
[    3.277496]  (driver?)
[    3.281311] 0106           16384 ram6
[    3.283691]  (driver?)
[    3.283691] 0107           16384 ram7
[    3.289855]  (driver?)
[    3.296051] 0108           16384 ram8
[    3.296051]  (driver?)
[    3.296051] 0109           16384 ram9
[    3.302215]  (driver?)
[    3.302215] 010a           16384 ram10
[    3.302215]  (driver?)
[    3.314636] 010b           16384 ram11
[    3.314636]  (driver?)
[    3.320892] 010c           16384 ram12
[    3.320922]  (driver?)
[    3.327178] 010d           16384 ram13
[    3.327178]  (driver?)
[    3.327178] 010e           16384 ram14
[    3.333435]  (driver?)
[    3.339691] 010f           16384 ram15
[    3.339691]  (driver?)
[    3.346038] 1f00             128 mtdblock0
[    3.346038]  (driver?)
[    3.352630] 1f01            1536 mtdblock1
[    3.352630]  (driver?)
[    3.356872] 1f02            1536 mtdblock2
[    3.356872]  (driver?)
[    3.365844] 1f03          521088 mtdblock3
[    3.365844]  (driver?)
[    3.365844] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[    3.372436] CPU0: stopping
[    3.383483] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc6 #1
[    3.383483] Hardware name: Generic OMAP4 (Flattened Device Tree)
[    3.389709]  unwind_backtrace from show_stack+0x18/0x1c
[    3.395751]  show_stack from dump_stack_lvl+0x58/0x70
[    3.395751]  dump_stack_lvl from do_handle_IPI+0x320/0x378
[    3.411682]  do_handle_IPI from ipi_handler+0x1c/0x28
[    3.411682]  ipi_handler from handle_percpu_devid_irq+0x9c/0x1ec
[    3.411682]  handle_percpu_devid_irq from handle_irq_desc+0x24/0x34
[    3.411682]  handle_irq_desc from gic_handle_irq+0x74/0xa8
[    3.434692]  gic_handle_irq from generic_handle_arch_irq+0x48/0xa8
[    3.434692]  generic_handle_arch_irq from __irq_svc+0x90/0xd4
[    3.434692] Exception stack(0xc1001e98 to 0xc1001ee0)
[    3.446716] 1e80:                                                       c0898bcc 00000000
[    3.446716] 1ea0: 2e4bb000 600001d3 00000000 c100fe00 00000000 c56d0372 ef462d38 00000000
[    3.460021] 1ec0: 00000000 00000000 00000001 c1001ee8 c025d168 c0898bd0 60000153 ffffffff
[    3.460021]  __irq_svc from cpuidle_enter_state+0x358/0x4b8
[    3.482086]  cpuidle_enter_state from cpuidle_enter_state_coupled+0x174/0x484
[    3.489288]  cpuidle_enter_state_coupled from cpuidle_enter+0x44/0x5c
[    3.489288]  cpuidle_enter from do_idle+0x1ec/0x2c4
[    3.500701]  do_idle from cpu_startup_entry+0x20/0x24
[    3.505798]  cpu_startup_entry from rest_init+0xbc/0xd8
[    3.511077]  rest_init from arch_post_acpi_subsys_init+0x0/0x18
[    3.511077] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---



I simply reboot, and the next time the kernel mount succeeds:

[    2.108001] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xb3
[    2.108001] nand: Micron MT29F8G16ADBDAH4
[    2.118957] nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
[    2.127044] nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme
[    2.131225] 4 fixed-partitions partitions found on MTD device omap2-nand.0
[    2.139617] Creating 4 MTD partitions on "omap2-nand.0":
[    2.145416] 0x000000000000-0x000000020000 : "xload_raw"
[    2.152923] 0x000000020000-0x0000001a0000 : "u-boot"
[    2.159606] 0x0000001a0000-0x000000320000 : "u-boot-2"
[    2.167480] 0x000000320000-0x000020000000 : "main"
[    2.397064] ubi0: attaching mtd3
[    3.143463] ubi0: scanning is finished
[    3.159973] ubi0: attached mtd3 (name "main", size 508 MiB)
[    3.165618] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 129024 bytes
[    3.172546] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 512
[    3.179321] ubi0: VID header offset: 512 (aligned 512), data offset: 2048
[    3.186157] ubi0: good PEBs: 4070, bad PEBs: 1, corrupted PEBs: 0
[    3.192291] ubi0: user volume: 4, internal volumes: 1, max. volumes count: 128
[    3.199584] ubi0: max/mean erase counter: 3/1, WL threshold: 4096, image sequence number: 461569733
[    3.208709] ubi0: available PEBs: 779, total reserved PEBs: 3291, PEBs reserved for bad PEB handling: 159
[    3.218383] ubi0: background thread "ubi_bgt0d" started, PID 173
[    3.232025] l4-secure-clkctrl:0038:0: failed to disable
[    3.237823] UBIFS (ubi0:3): Mounting in unauthenticated mode
[    3.245208] UBIFS (ubi0:3): background thread "ubifs_bgt0_3" started, PID 174
[    3.279571] UBIFS (ubi0:3): UBIFS: mounted UBI device 0, volume 3, name "rootfs-b"
[    3.279571] UBIFS (ubi0:3): LEB size: 129024 bytes (126 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
[    3.300415] UBIFS (ubi0:3): FS size: 199987200 bytes (190 MiB, 1550 LEBs), max 2048 LEBs, journal size 9033728 bytes (8 MiB, 71 LEBs)
[    3.312530] UBIFS (ubi0:3): reserved for root: 0 bytes (0 KiB)
[    3.318420] UBIFS (ubi0:3): media format: w4/r0 (latest is w5/r0), UUID 27C99F76-8955-49A0-81C5-115DDE0BC3D8, small LPT model
[    3.329925] VFS: Mounted root (ubifs filesystem) on device 0:16.
[    3.336700] devtmpfs: mounted
[    3.345489] Freeing unused kernel image (initmem) memory: 1024K
[    3.352996] Run /sbin/init as init process


The bootloader was able to attach rootfs-b to get the kernel and
devicetree, but the kernel fails to do so. Nothing should have
written anything, so I feel there shouldn't have been an opportunity to
corrupt anything. Any suggestions on where to look?



Thank you very much,

Colin Foster



More information about the linux-mtd mailing list