UBIFS corruption after power cut - possibly unstable bits issue?
Tim Harvey
tharvey at gateworks.com
Mon Nov 2 12:31:11 PST 2015
On Mon, Nov 2, 2015 at 12:27 PM, Tim Harvey <tharvey at gateworks.com> wrote:
> On Tue, Oct 27, 2015 at 12:52 PM, Richard Weinberger <richard at nod.at> wrote:
>> Tim,
>>
>> Am 27.10.2015 um 20:01 schrieb Tim Harvey:
>>> I'm not understanding what is making you say that the issue I
>>> encountered is 'not' the unstable bits issue described at
>>> http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits? My
>>> understanding is that the 'unstable bit' issue refers to bits which
>>> are truly unstable and can read either way each and every read due to
>>> not getting properly erased/written.
>>
>> You are right. I was sorting out the unstable bits issue a bit too
>> early. I'm sorry.
>> Let's double check. Can you enable UBI verbose logging while testing?
>> Such that we can see which blocks were written/erased while the power cut
>> happened?
>>
>
> Richard,
>
> I have 2 IMX6 based boards that I've recreated a UBI/UBIFS NAND
> corruption using Linux 4.2 with dynamic debug enabled for
> ubi_io_write. This particular test cycle consisted of 5 different
> boards and had over 40,000 boot/power-cuts before corruption occurred.
> In each case the corruption is in a slightly different area.
>
> The boards remain in a corrupted state which will allow me to get
> further debug info.
>
> Board1 looks like this:
>
> [ 0.000000] Linux version 4.2.0-00015-g64291f7 (tharvey at tharvey)
> (gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r43990) ) #195 SMP
> Thu Oct 29 14:27:36 PDT 2015
> [ 0.000000] CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c5387d
> [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing
> instruction cache
> [ 0.000000] Machine model: Gateworks Ventana i.MX6 Dual/Quad GW52XX
> ...
> [ 0.000000] Kernel command line: console=ttymxc1,115200
> root=ubi0:rootfs ubi.mtd=2 rootfstype=ubifs dyndbg="func ubi_io_write
> +p" debug
> ...
> [ 2.045065] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd5
> [ 2.051477] nand: Micron MT29F16G08ADACAH4
> [ 2.055590] nand: 2048 MiB, SLC, erase size: 256 KiB, page size:
> 4096, OOB size: 224
> [ 2.087400] gpmi-nand 112000.gpmi-nand: enable the asynchronous EDO mode 5
> [ 2.094321] Scanning device for bad blocks
> ...
> [ 3.704946] 3 ofpart partitions found on MTD device gpmi-nand
> [ 3.710735] Creating 3 MTD partitions on "gpmi-nand":
> [ 3.715823] 0x000000000000-0x000001000000 : "uboot"
> [ 3.725088] 0x000001000000-0x000001100000 : "env"
> [ 3.733000] 0x000001100000-0x000080000000 : "rootfs"
> [ 3.741880] gpmi-nand 112000.gpmi-nand: driver registered.
> ...
> [ 4.365104] ubi0: attaching mtd2
> [ 8.418862] ubi0: scanning is finished
> [ 8.445084] ubi0: attached mtd2 (name "rootfs", size 2031 MiB)
> [ 8.450977] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
> [ 8.457864] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
> [ 8.464676] ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
> [ 8.471666] ubi0: good PEBs: 8124, bad PEBs: 0, corrupted PEBs: 0
> [ 8.477767] ubi0: user volume: 1, internal volumes: 1, max. volumes
> count: 128
> [ 8.485015] ubi0: max/mean erase counter: 7/3, WL threshold: 4096,
> image sequence number: 42283325
> [ 8.494000] ubi0: available PEBs: 0, total reserved PEBs: 8124,
> PEBs reserved for bad PEB handling: 160
> [ 8.503610] ubi0: background thread "ubi_bgt0d" started, PID 81
> ...
> [ 8.615672] UBIFS (ubi0:0): recovery needed
> [ 8.657083] UBIFS (ubi0:0): recovery deferred
> [ 8.661965] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0,
> name "rootfs", R/O mode
> [ 8.670290] UBIFS (ubi0:0): LEB size: 253952 bytes (248 KiB),
> min./max. I/O unit sizes: 4096 bytes/4096 bytes
> [ 8.680248] UBIFS (ubi0:0): FS size: 2018918400 bytes (1925 MiB,
> 7950 LEBs), journal size 9404416 bytes (8 MiB, 38 LEBs)
> [ 8.691145] UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
> [ 8.697001] UBIFS (ubi0:0): media format: w4/r0 (latest is w4/r0),
> UUID BB5E5EB5-6A21-48AA-9C31-DE6B4A32ABAD, small LPT model
> [ 8.709784] VFS: Mounted root (ubifs filesystem) readonly on device 0:14.
> [ 8.719451] devtmpfs: mounted
> [ 8.723002] Freeing unused kernel memory: 420K (80aa5000 - 80b0e000)
> INIT: version 2.88 booting
> Starting udev
> [ 9.548551] udevd[114]: starting version 182
> [ 10.018775] UBIFS (ubi0:0): completing deferred recovery
> [ 10.043624] UBI DBG io (pid 145): write 4096 bytes to PEB 2253:4096
> [ 10.054878] UBI DBG io (pid 145): write 94208 bytes to PEB 2253:8192
> [ 10.096644] UBI DBG io (pid 81): write 4096 bytes to PEB 2250:0
> [ 10.143314] UBI DBG io (pid 145): write 4096 bytes to PEB 2254:4096
> [ 10.153917] UBI DBG io (pid 145): write 184320 bytes to PEB 2254:8192
> ^^^^^ PEB 2254 is the one that becomes un-readable on the next boot
>
> [ 10.186288] UBI DBG io (pid 81): write 4096 bytes to PEB 2251:0
> [ 10.204620] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 173
> [ 10.213375] UBI DBG io (pid 145): write 4096 bytes to PEB 2255:4096
> [ 10.223864] UBI DBG io (pid 145): write 4096 bytes to PEB 2255:8192
> [ 10.239935] UBI DBG io (pid 145): write 4096 bytes to PEB 2237:32768
> [ 10.261787] UBI DBG io (pid 145): write 4096 bytes to PEB 2184:102400
> [ 10.272978] UBI DBG io (pid 145): write 4096 bytes to PEB 2173:114688
> [ 10.285948] UBI DBG io (pid 145): write 4096 bytes to PEB 2174:114688
> [ 10.296027] UBIFS (ubi0:0): deferred recovery completed
> [ 10.299453] UBI DBG io (pid 81): write 4096 bytes to PEB 2252:0
> bootlogd: cannot allocate pseudo tty: No such file or directory
> ALSA: Restoring mixer settings...
> Mon Oct 26 16:37:16 UTC 2015
> INIT: Entering runlevel: 5
> Configuring network interfaces... [ 11.471755] fec 2188000.ethernet
> eth0: Freescale FEC PHY driver [Generic PHY]
> (mii_bus:phy_addr=2188000.ethernet:00, irq=-1)
> [ 11.483499] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> udhcpc (v1.23.1) started
> Sending discover...
> Sending discover...
> [ 15.000462] UBI DBG io (pid 173): write 4096 bytes to PEB 2253:102400
> [ 16.472720] fec 2188000.ethernet eth0: Link is Up - 1Gbps/Full -
> flow control rx/tx
> [ 16.480975] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> Sending discover...
> Sending select for 192.168.1.1...
> Lease of 192.168.1.1 obtained, lease time 21600
> /etc/udhcpc.d/50default: Adding DNS 192.168.1.100
> done.
>
> <power cut>
>
> [ 0.000000] Booting Linux on physical CPU 0x0
> [ 0.000000] Linux version 4.2.0-00015-g64291f7 (tharvey at tharvey)
> (gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r43990) ) #195 SMP
> Thu Oct 29 14:27:36 PDT 2015
> [ 0.000000] CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c5387d
> [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing
> instruction cache
> [ 0.000000] Machine model: Gateworks Ventana i.MX6 Dual/Quad GW52XX
> ...
> [ 2.044936] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd5
> [ 2.051348] nand: Micron MT29F16G08ADACAH4
> [ 2.055461] nand: 2048 MiB, SLC, erase size: 256 KiB, page size:
> 4096, OOB size: 224
> [ 2.087272] gpmi-nand 112000.gpmi-nand: enable the asynchronous EDO mode 5
> [ 2.094196] Scanning device for bad blocks
> ...
> [ 3.710706] Creating 3 MTD partitions on "gpmi-nand":
> [ 3.715793] 0x000000000000-0x000001000000 : "uboot"
> [ 3.725027] 0x000001000000-0x000001100000 : "env"
> [ 3.732929] 0x000001100000-0x000080000000 : "rootfs"
> [ 3.741781] gpmi-nand 112000.gpmi-nand: driver registered.
> ...
> [ 4.366226] ubi0: attaching mtd2
> [ 8.445733] ubi0: scanning is finished
> [ 8.477551] ubi0: attached mtd2 (name "rootfs", size 2031 MiB)
> [ 8.483443] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
> [ 8.490355] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
> [ 8.497155] ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
> [ 8.504148] ubi0: good PEBs: 8124, bad PEBs: 0, corrupted PEBs: 0
> [ 8.510274] ubi0: user volume: 1, internal volumes: 1, max. volumes
> count: 128
> [ 8.517510] ubi0: max/mean erase counter: 7/3, WL threshold: 4096,
> image sequence number: 42283325
> [ 8.526498] ubi0: available PEBs: 0, total reserved PEBs: 8124,
> PEBs reserved for bad PEB handling: 160
> [ 8.536151] ubi0: background thread "ubi_bgt0d" started, PID 82
> ...
> [ 8.635364] UBIFS (ubi0:0): recovery needed
> [ 8.676203] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> [ 8.692460] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> [ 8.708741] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> ^^^^ non correctable ecc error on PEB 2254 - I verified that this was
> not the first time this PEB has been used
> [ 8.724987] ubi0 error: ubi_io_read: error -74 (ECC error) while
> reading 69632 bytes from PEB 2254:192512, read 69632 bytes
> [ 8.736163] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 4.2.0-00015-g64291f7 dirty #195
> [ 8.744021] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [ 8.750582] Backtrace:
> [ 8.753074] [<8001400c>] (dump_backtrace) from [<800141f8>]
> (show_stack+0x18/0x1c)
> [ 8.760667] r6:80b2d960 r5:80b2d960 r4:00000000 r3:00000000
> [ 8.766424] [<800141e0>] (show_stack) from [<807ceeb8>]
> (dump_stack+0x8c/0xa4)
> [ 8.773687] [<807cee2c>] (dump_stack) from [<80474730>]
> (ubi_io_read+0x130/0x378)
> [ 8.781212] r6:00000000 r5:00011000 r4:ffffffb6 r3:be070000
> [ 8.786951] [<80474600>] (ubi_io_read) from [<80471564>]
> (ubi_eba_read_leb+0x19c/0x4c8)
> [ 8.794975] r10:00000000 r9:c0c63000 r8:000008ce r7:00001f15
> r6:00000000 r5:bda3a000
> [ 8.802928] r4:bda12000
> [ 8.805489] [<804713c8>] (ubi_eba_read_leb) from [<8047077c>]
> (ubi_leb_read+0x94/0x130)
> [ 8.813527] r10:00000000 r9:c0c63000 r8:bda3a000 r7:0002d000
> r6:00001f15 r5:00011000
> [ 8.821467] r4:bda12000
> [ 8.824037] [<804706e8>] (ubi_leb_read) from [<80257870>]
> (ubifs_leb_read+0x30/0x9c)
> [ 8.831815] r10:be29a000 r9:80b77ff8 r8:00001f15 r7:0002d000
> r6:00011000 r5:be29a000
> [ 8.839731] r4:bdaea600
> [ 8.842312] [<80257840>] (ubifs_leb_read) from [<80260fac>]
> (ubifs_start_scan+0x84/0x134)
> [ 8.850522] r8:c0c36000 r7:00001f15 r6:be29a000 r5:0002d000 r4:bdaea600
> [ 8.857322] [<80260f28>] (ubifs_start_scan) from [<80276270>]
> (ubifs_recover_leb+0x80/0xb80)
> [ 8.865794] r8:00001f15 r7:c0c63000 r6:00011000 r5:00001f15 r4:00000001
> [ 8.872609] [<802761f0>] (ubifs_recover_leb) from [<802627f0>]
> (ubifs_replay_journal+0xd34/0x1a48)
> [ 8.881601] r10:00017188 r9:0002d000 r8:be29a000 r7:00000000
> r6:bdaea608 r5:00001f15
> [ 8.889518] r4:00000001
> [ 8.892106] [<80261abc>] (ubifs_replay_journal) from [<80254954>]
> (ubifs_mount+0x136c/0x2170)
> [ 8.900666] r10:be29a008 r9:00000000 r8:0003db70 r7:be29a000
> r6:80b76870 r5:be3f6000
> [ 8.908584] r4:00000000
> [ 8.911171] [<802535e8>] (ubifs_mount) from [<801055f8>] (mount_fs+0x1c/0xb0)
> [ 8.918323] r10:00008001 r9:00000000 r8:80b2a2e0 r7:80124e34
> r6:80b2a2e0 r5:bd9cfac0
> [ 8.926258] r4:be365900
> [ 8.928829] [<801055dc>] (mount_fs) from [<80121868>]
> (vfs_kern_mount+0x5c/0x144)
> [ 8.936355] r6:00008001 r5:bd9cfac0 r4:be365900
> [ 8.941056] [<8012180c>] (vfs_kern_mount) from [<80124e34>]
> (do_mount+0x19c/0xd0c)
> [ 8.948642] r9:80b2a2e0 r8:80b1ebdc r7:bd9cfa80 r6:bd9cfac0
> r5:00000060 r4:00000000
> [ 8.956498] [<80124c98>] (do_mount) from [<80125d38>] (SyS_mount+0x7c/0xa8)
> [ 8.963493] r10:000000d7 r9:80afa7b0 r8:8099d910 r7:00008001
> r6:00000000 r5:bd9cfac0
> [ 8.971429] r4:bd9cfa80
> [ 8.973993] [<80125cbc>] (SyS_mount) from [<80aa6260>]
> (mount_block_root+0x114/0x240)
> [ 8.981856] r8:befa3a40 r7:80afa7a0 r6:bd9d2000 r5:bd9d2000 r4:00008001
> [ 8.988651] [<80aa614c>] (mount_block_root) from [<80aa664c>]
> (prepare_namespace+0x98/0x1d0)
> [ 8.997123] r10:000000d7 r9:80afa780 r8:80b087e4 r7:80afa778
> r6:80afa7a0 r5:80afa7a0
> [ 9.005058] r4:80afa7b0
> [ 9.007617] [<80aa65b4>] (prepare_namespace) from [<80aa5efc>]
> (kernel_init_freeable+0x1a4/0x1f0)
> [ 9.016522] r6:80b8b000 r5:80b8b000 r4:00000007
> [ 9.021217] [<80aa5d58>] (kernel_init_freeable) from [<807c91b4>]
> (kernel_init+0x10/0xf4)
> [ 9.029412] r10:00000000 r9:00000000 r8:00000000 r7:00000000
> r6:00000000 r5:807c91a4
> [ 9.037354] r4:00000000
> [ 9.039918] [<807c91a4>] (kernel_init) from [<800104b8>]
> (ret_from_fork+0x14/0x3c)
> [ 9.047525] r4:00000000 r3:00000000
> [ 9.051259] UBIFS error (ubi0:0 pid 1): ubifs_recover_leb: corrupt
> empty space LEB 7957:184320, corruption starts at 4665
> [ 9.062247] UBIFS error (ubi0:0 pid 1): ubifs_scanned_corruption:
> corruption at LEB 7957:188985
> [ 9.070972] UBIFS error (ubi0:0 pid 1): ubifs_scanned_corruption:
> first 8192 bytes from LEB 7957:188985
> [ 9.080396] 00000000: ffffffef ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ................................
> [ 9.091903] 00000020: ffffffff ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ................................
> ...
> [ 12.002997] 00001fc0: ffffffff ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ................................
> [ 12.014501] 00001fe0: ffffffff ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ................................
> [ 12.026000] UBIFS error (ubi0:0 pid 1): ubifs_recover_leb: LEB 7957
> scanning failed
> ^^^^ recovery failed
> [ 12.036886] VFS: Cannot open root device "ubi0:rootfs" or
> unknown-block(0,0): error -117
>
> After added Huang Shijie's 'mtd: gpmi: fix the bitflips for erased
> page' patch from http://patchwork.ozlabs.org/patch/309763/ which shows
> the following on board1:
> [ 8.532624] UBIFS (ubi0:0): recovery needed
> [ 8.569104] gpmi-nand 112000.gpmi-nand: The page(148656) is an
> erased page(1,6,1,1).
> [ 8.577748] gpmi-nand 112000.gpmi-nand: The page(148658) is an
> erased page(1,6,1,1).
> [ 8.586413] gpmi-nand 112000.gpmi-nand: The page(148660) is an
> erased page(1,6,1,1).
> [ 8.595050] gpmi-nand 112000.gpmi-nand: The page(148662) is an
> erased page(1,6,1,1).
> [ 8.603705] gpmi-nand 112000.gpmi-nand: The page(148664) is an
> erased page(1,6,1,1).
> [ 8.612310] gpmi-nand 112000.gpmi-nand: The page(148666) is an
> erased page(1,6,1,1).
> [ 8.620940] gpmi-nand 112000.gpmi-nand: The page(148668) is an
> erased page(1,6,1,1).
> [ 8.629580] gpmi-nand 112000.gpmi-nand: The page(148670) is an
> erased page(1,6,1,1).
> ^^^^ 1 bit-flip on ecc read, 1 bit-flip on non-ecc read, ecc-str=6,
> memset's buffer to 0xff to attempt to recover
> [ 8.644345] UBIFS (ubi0:0): recovery deferred
> [ 8.649209] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0,
> name "rootfs", R/O mode
> [ 8.657537] UBIFS (ubi0:0): LEB size: 253952 bytes (248 KiB),
> min./max. I/O unit sizes: 4096 bytes/4096 bytes
> [ 8.667486] UBIFS (ubi0:0): FS size: 2018918400 bytes (1925 MiB,
> 7950 LEBs), journal size 9404416 bytes (8 MiB, 38 LEBs)
> [ 8.678409] UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
> [ 8.684283] UBIFS (ubi0:0): media format: w4/r0 (latest is w4/r0),
> UUID BB5E5EB5-6A21-48AA-9C31-DE6B4A32ABAD, small LPT model
> [ 8.697990] VFS: Mounted root (ubifs filesystem) readonly on device 0:14.
> ^^^^ able to recover and always ends up this way over further boots
> ^^^^ after applying this patch I boot with init=foo so as to not allow
> rootfs to get mounted read/write by udev and perhaps permanently
> recover the board
>
>
>
> Board2 looks like this:
> [ 0.000000] Booting Linux on physical CPU 0x0
> [ 0.000000] Linux version 4.2.0-00015-g64291f7 (tharvey at tharvey)
> (gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r43990) ) #195 SMP
> Thu Oct 29 14:27:36 PDT 2015
> [ 0.000000] CPU: ARMv7 Processor [412fc09a] revision 10 (ARMv7), cr=10c5387d
> [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing
> instruction cache
> [ 0.000000] Machine model: Gateworks Ventana i.MX6 Dual/Quad GW552X
> ...
> [ 0.000000] Kernel command line: console=ttymxc1,115200
> root=ubi0:rootfs ubi.mtd=2 rootfstype=ubifs dyndbg="func ubi_io_write
> +p" debug
> ...
> [ 2.341988] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd5
> [ 2.348382] nand: Micron MT29F16G08ADACAH4
> [ 2.352495] nand: 2048 MiB, SLC, erase size: 256 KiB, page size:
> 4096, OOB size: 224
> [ 2.384301] gpmi-nand 112000.gpmi-nand: enable the asynchronous EDO mode 5
> [ 2.391224] Scanning device for bad blocks
> ...
> [ 4.024472] 3 ofpart partitions found on MTD device gpmi-nand
> [ 4.030265] Creating 3 MTD partitions on "gpmi-nand":
> [ 4.035336] 0x000000000000-0x000001000000 : "uboot"
> [ 4.044635] 0x000001000000-0x000001100000 : "env"
> [ 4.052559] 0x000001100000-0x000080000000 : "rootfs"
> [ 4.061452] gpmi-nand 112000.gpmi-nand: driver registered.
> ...
> [ 4.426420] ubi0: attaching mtd2
> [ 8.514755] ubi0: scanning is finished
> [ 8.541225] ubi0: attached mtd2 (name "rootfs", size 2031 MiB)
> [ 8.547108] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
> [ 8.553994] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
> [ 8.560810] ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
> [ 8.567799] ubi0: good PEBs: 8124, bad PEBs: 0, corrupted PEBs: 0
> [ 8.573901] ubi0: user volume: 1, internal volumes: 1, max. volumes
> count: 128
> [ 8.581161] ubi0: max/mean erase counter: 4/2, WL threshold: 4096,
> image sequence number: 42283325
> [ 8.590148] ubi0: available PEBs: 0, total reserved PEBs: 8124,
> PEBs reserved for bad PEB handling: 160
> [ 8.599737] ubi0: background thread "ubi_bgt0d" started, PID 74
> ...
> [ 8.712166] UBIFS (ubi0:0): recovery needed
> [ 8.749882] UBIFS (ubi0:0): recovery deferred
> [ 8.754639] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0,
> name "rootfs", R/O mode
> [ 8.762941] UBIFS (ubi0:0): LEB size: 253952 bytes (248 KiB),
> min./max. I/O unit sizes: 4096 bytes/4096 bytes
> [ 8.772883] UBIFS (ubi0:0): FS size: 2018918400 bytes (1925 MiB,
> 7950 LEBs), journal size 9404416 bytes (8 MiB, 38 LEBs)
> [ 8.783773] UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
> [ 8.789631] UBIFS (ubi0:0): media format: w4/r0 (latest is w4/r0),
> UUID BB5E5EB5-6A21-48AA-9C31-DE6B4A32ABAD, small LPT model
> [ 8.803366] VFS: Mounted root (ubifs filesystem) readonly on device 0:14.
> [ 8.813172] devtmpfs: mounted
> [ 8.816746] Freeing unused kernel memory: 420K (80aa5000 - 80b0e000)
> INIT: version 2.88 booting
> Starting udev
> [ 9.621167] udevd[106]: starting version 182
> [ 10.525521] UBIFS (ubi0:0): completing deferred recovery
> [ 10.540072] UBI DBG io (pid 162): write 4096 bytes to PEB 5891:4096
> [ 10.547480] UBI DBG io (pid 162): write 110592 bytes to PEB 5891:8192
> [ 10.578168] UBI DBG io (pid 162): write 4096 bytes to PEB 5892:4096
> [ 10.585053] UBI DBG io (pid 74): write 4096 bytes to PEB 2944:0
> [ 10.585639] UBI DBG io (pid 162): write 163840 bytes to PEB 5892:8192
> [ 10.607478] UBI DBG io (pid 74): write 4096 bytes to PEB 2945:0
> [ 10.615485] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 163
> [ 10.617094] UBI DBG io (pid 162): write 4096 bytes to PEB 5893:4096
> [ 10.618542] UBI DBG io (pid 162): write 4096 bytes to PEB 5893:8192
> [ 10.619487] UBI DBG io (pid 162): write 4096 bytes to PEB 2859:212992
> [ 10.620347] UBI DBG io (pid 162): write 4096 bytes to PEB 2850:225280
> ^^^^^ this is the PEB that gets uncorrectable ECC errors - I verified
> that this was not the first time this PEB has been used
> [ 10.621476] UBI DBG io (pid 162): write 4096 bytes to PEB 2851:225280
> [ 10.623058] UBIFS (ubi0:0): deferred recovery completed
> [ 10.628641] UBI DBG io (pid 74): write 4096 bytes to PEB 2946:0
> bootlogd: cannot allocate pseudo tty: No such file or directory
> Populating dev cache
> ALSA: Restoring mixer settings...
> /usr/sbin/alsactl: load_state:1729: No soundcards found...
> Sat Oct 24 11:28:04 UTC 2015
> INIT: Entering runlevel: 5
> Configuring network interfaces... ifconfig: SIOCGIFFLAGS: No such device
> Starting system message bus: dbus.
> Starting Dropbear SSH server: dropbear.
> Starting rpcbind daemon...done.
> Starting watchdog: [1
> 34m[ [1
> 32m OK [1
> 34m] [0
> 39m
> Starting advanced power management daemon: No APM support in kernel
> (failed.)
> starting DNS forwarder and DHCP server: dnsmasq... done.
>
> <power cut>
>
> [ 0.000000] Booting Linux on physical CPU 0x0
> [ 0.000000] Linux version 4.2.0-00015-g64291f7 (tharvey at tharvey)
> (gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r43990) ) #195 SMP
> Thu Oct 29 14:27:36 PDT 2015
> ...
> [ 2.332131] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd5
> [ 2.338524] nand: Micron MT29F16G08ADACAH4
> [ 2.342638] nand: 2048 MiB, SLC, erase size: 256 KiB, page size:
> 4096, OOB size: 224
> [ 2.374442] gpmi-nand 112000.gpmi-nand: enable the asynchronous EDO mode 5
> [ 2.381366] Scanning device for bad blocks
> ...
> [ 4.013255] 3 ofpart partitions found on MTD device gpmi-nand
> [ 4.019040] Creating 3 MTD partitions on "gpmi-nand":
> [ 4.024111] 0x000000000000-0x000001000000 : "uboot"
> [ 4.033352] 0x000001000000-0x000001100000 : "env"
> [ 4.041210] 0x000001100000-0x000080000000 : "rootfs"
> [ 4.050164] gpmi-nand 112000.gpmi-nand: driver registered.
> ...
> [ 4.416614] ubi0: attaching mtd2
> [ 8.500977] ubi0: scanning is finished
> [ 8.527328] ubi0: attached mtd2 (name "rootfs", size 2031 MiB)
> [ 8.533184] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
> [ 8.540089] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
> [ 8.546906] ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
> [ 8.553878] ubi0: good PEBs: 8124, bad PEBs: 0, corrupted PEBs: 0
> [ 8.559998] ubi0: user volume: 1, internal volumes: 1, max. volumes
> count: 128
> [ 8.567248] ubi0: max/mean erase counter: 4/2, WL threshold: 4096,
> image sequence number: 42283325
> [ 8.576216] ubi0: available PEBs: 0, total reserved PEBs: 8124,
> PEBs reserved for bad PEB handling: 160
> [ 8.585813] ubi0: background thread "ubi_bgt0d" started, PID 73
> ..
> [ 8.683337] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
> [ 8.710495] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
> [ 8.736821] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
> [ 8.763179] ubi0 error: ubi_io_read: error -74 (ECC error) while
> reading 253952 bytes from PEB 2850:8192, read 253952 bytes
> ^^^^ uncorrectable ECC error on PEB 2850
> [ 8.774340] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 4.2.0-00015-g64291f7 #195
> [ 8.782194] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [ 8.788743] Backtrace:
> [ 8.791237] [<8001400c>] (dump_backtrace) from [<800141f8>]
> (show_stack+0x18/0x1c)
> [ 8.798839] r6:80b2d960 r5:80b2d960 r4:00000000 r3:00000000
> [ 8.804589] [<800141e0>] (show_stack) from [<807ceeb8>]
> (dump_stack+0x8c/0xa4)
> [ 8.811855] [<807cee2c>] (dump_stack) from [<80474730>]
> (ubi_io_read+0x130/0x378)
> [ 8.819359] r6:00000000 r5:0003e000 r4:ffffffb6 r3:9e468000
> [ 8.825099] [<80474600>] (ubi_io_read) from [<80471564>]
> (ubi_eba_read_leb+0x19c/0x4c8)
> [ 8.833125] r10:00000000 r9:a0c0d000 r8:00000b22 r7:00000001
> r6:00000000 r5:9dd8a000
> [ 8.841065] r4:9dd9e000
> [ 8.843628] [<804713c8>] (ubi_eba_read_leb) from [<8047077c>]
> (ubi_leb_read+0x94/0x130)
> [ 8.851661] r10:00000000 r9:a0c0d000 r8:9dd8a000 r7:00000000
> r6:00000001 r5:0003e000
> [ 8.859596] r4:9dd9e000
> [ 8.862170] [<804706e8>] (ubi_leb_read) from [<80257870>]
> (ubifs_leb_read+0x30/0x9c)
> [ 8.869937] r10:9e673008 r9:00000001 r8:00000001 r7:00000000
> r6:0003e000 r5:9e673000
> [ 8.877873] r4:9de22240
> [ 8.880437] [<80257840>] (ubifs_leb_read) from [<80260fac>]
> (ubifs_start_scan+0x84/0x134)
> [ 8.888637] r8:a0c0d000 r7:00000001 r6:9e673000 r5:00000000 r4:9de22240
> [ 8.895433] [<80260f28>] (ubifs_start_scan) from [<802613ac>]
> (ubifs_scan+0x30/0x3d4)
> [ 8.903284] r8:00000001 r7:9e673000 r6:a0c0d000 r5:00000000 r4:0003e000
> [ 8.910097] [<8026137c>] (ubifs_scan) from [<8025fcd8>]
> (ubifs_read_master+0x48/0xd78)
> [ 8.918034] r10:9e673008 r9:9dd7bf80 r8:9e673000 r7:00000001
> r6:80b76870 r5:9dda1800
> [ 8.925952] r4:00000001
> [ 8.928533] [<8025fc90>] (ubifs_read_master) from [<80254714>]
> (ubifs_mount+0x112c/0x2170)
> [ 8.936819] r10:9e673008 r9:9dd7bf80 r8:9e673b38 r7:9e673000
> r6:80b76870 r5:9dda1800
> [ 8.944738] r4:00000001
> [ 8.947326] [<802535e8>] (ubifs_mount) from [<801055f8>] (mount_fs+0x1c/0xb0)
> [ 8.954466] r10:00008001 r9:00000000 r8:80b2a2e0 r7:80124e34
> r6:80b2a2e0 r5:9dd73c80
> [ 8.962404] r4:9dd76500
> [ 8.964975] [<801055dc>] (mount_fs) from [<80121868>]
> (vfs_kern_mount+0x5c/0x144)
> [ 8.972479] r6:00008001 r5:9dd73c80 r4:9dd76500
> [ 8.977177] [<8012180c>] (vfs_kern_mount) from [<80124e34>]
> (do_mount+0x19c/0xd0c)
> [ 8.984750] r9:80b2a2e0 r8:80b1ebdc r7:9e7b5d00 r6:9dd73c80
> r5:00000060 r4:00000000
> [ 8.992605] [<80124c98>] (do_mount) from [<80125d38>] (SyS_mount+0x7c/0xa8)
> [ 8.999588] r10:000000d7 r9:80afa7b0 r8:8099d910 r7:00008001
> r6:00000000 r5:9dd73c80
> [ 9.007522] r4:9e7b5d00
> [ 9.010087] [<80125cbc>] (SyS_mount) from [<80aa6260>]
> (mount_block_root+0x114/0x240)
> [ 9.017941] r8:9efacd20 r7:80afa7a0 r6:9de69000 r5:9de69000 r4:00008001
> [ 9.024736] [<80aa614c>] (mount_block_root) from [<80aa664c>]
> (prepare_namespace+0x98/0x1d0)
> [ 9.033195] r10:000000d7 r9:80afa780 r8:80b087e4 r7:80afa778
> r6:80afa7a0 r5:80afa7a0
> [ 9.041129] r4:80afa7b0
> [ 9.043690] [<80aa65b4>] (prepare_namespace) from [<80aa5efc>]
> (kernel_init_freeable+0x1a4/0x1f0)
> [ 9.052592] r6:80b8b000 r5:80b8b000 r4:00000007
> [ 9.057290] [<80aa5d58>] (kernel_init_freeable) from [<807c91b4>]
> (kernel_init+0x10/0xf4)
> [ 9.065472] r10:00000000 r9:00000000 r8:00000000 r7:00000000
> r6:00000000 r5:807c91a4
> [ 9.073407] r4:00000000
> [ 9.075970] [<807c91a4>] (kernel_init) from [<800104b8>]
> (ret_from_fork+0x14/0x3c)
> [ 9.083560] r4:00000000 r3:00000000
> [ 9.102369] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
> [ 9.128774] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
> [ 9.155184] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> reading 253952 bytes from PEB 2850:8192, read only 253952 bytes, retry
> [ 9.181522] ubi0 error: ubi_io_read: error -74 (ECC error) while
> reading 253952 bytes from PEB 2850:8192, read 253952 bytes
> ^^^^ unrecoverable ECC error on PEB 2850
> [ 9.192683] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 4.2.0-00015-g64291f7 #195
> [ 9.200535] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [ 9.207083] Backtrace:
> [ 9.209573] [<8001400c>] (dump_backtrace) from [<800141f8>]
> (show_stack+0x18/0x1c)
> [ 9.217173] r6:80b2d960 r5:80b2d960 r4:00000000 r3:00000000
> [ 9.222916] [<800141e0>] (show_stack) from [<807ceeb8>]
> (dump_stack+0x8c/0xa4)
> [ 9.230177] [<807cee2c>] (dump_stack) from [<80474730>]
> (ubi_io_read+0x130/0x378)
> [ 9.237688] r6:00000000 r5:0003e000 r4:ffffffb6 r3:9e468000
> [ 9.243429] [<80474600>] (ubi_io_read) from [<80471564>]
> (ubi_eba_read_leb+0x19c/0x4c8)
> [ 9.251458] r10:00000000 r9:a0c4c000 r8:00000b22 r7:00000001
> r6:00000000 r5:9dd8a000
> [ 9.259398] r4:9dd9e000
> [ 9.261962] [<804713c8>] (ubi_eba_read_leb) from [<8047077c>]
> (ubi_leb_read+0x94/0x130)
> [ 9.269988] r10:00000000 r9:a0c4c000 r8:9dd8a000 r7:00000000
> r6:00000001 r5:0003e000
> [ 9.277925] r4:9dd9e000
> [ 9.280495] [<804706e8>] (ubi_leb_read) from [<80257870>]
> (ubifs_leb_read+0x30/0x9c)
> [ 9.288264] r10:9e673000 r9:9dd7bf80 r8:00000001 r7:00000000
> r6:0003e000 r5:9e673000
> [ 9.296182] r4:9e673000
> [ 9.298768] [<80257840>] (ubifs_leb_read) from [<8027569c>]
> (get_master_node+0x54/0x324)
> [ 9.306880] r8:9e673000 r7:80b77ff8 r6:00001000 r5:00001000 r4:9e673000
> [ 9.313681] [<80275648>] (get_master_node) from [<80275d98>]
> (ubifs_recover_master_node+0x60/0x3ec)
> [ 9.322749] r10:9e673008 r9:9dd7bf80 r8:9e673000 r7:80b77ff8
> r6:00001000 r5:9dda1800
> [ 9.330685] r4:9e673000
> [ 9.333248] [<80275d38>] (ubifs_recover_master_node) from
> [<8026080c>] (ubifs_read_master+0xb7c/0xd78)
> [ 9.342575] r7:00000001 r6:80b76870 r5:9dda1800 r4:ffffff8b
> [ 9.348335] [<8025fc90>] (ubifs_read_master) from [<80254714>]
> (ubifs_mount+0x112c/0x2170)
> [ 9.356621] r10:9e673008 r9:9dd7bf80 r8:9e673b38 r7:9e673000
> r6:80b76870 r5:9dda1800
> [ 9.364540] r4:00000001
> [ 9.367127] [<802535e8>] (ubifs_mount) from [<801055f8>] (mount_fs+0x1c/0xb0)
> [ 9.374266] r10:00008001 r9:00000000 r8:80b2a2e0 r7:80124e34
> r6:80b2a2e0 r5:9dd73c80
> [ 9.382204] r4:9dd76500
> [ 9.384772] [<801055dc>] (mount_fs) from [<80121868>]
> (vfs_kern_mount+0x5c/0x144)
> [ 9.392276] r6:00008001 r5:9dd73c80 r4:9dd76500
> [ 9.396974] [<8012180c>] (vfs_kern_mount) from [<80124e34>]
> (do_mount+0x19c/0xd0c)
> [ 9.404547] r9:80b2a2e0 r8:80b1ebdc r7:9e7b5d00 r6:9dd73c80
> r5:00000060 r4:00000000
> [ 9.412404] [<80124c98>] (do_mount) from [<80125d38>] (SyS_mount+0x7c/0xa8)
> [ 9.419386] r10:000000d7 r9:80afa7b0 r8:8099d910 r7:00008001
> r6:00000000 r5:9dd73c80
> [ 9.427320] r4:9e7b5d00
> [ 9.429887] [<80125cbc>] (SyS_mount) from [<80aa6260>]
> (mount_block_root+0x114/0x240)
> [ 9.437743] r8:9efacd20 r7:80afa7a0 r6:9de69000 r5:9de69000 r4:00008001
> [ 9.444539] [<80aa614c>] (mount_block_root) from [<80aa664c>]
> (prepare_namespace+0x98/0x1d0)
> [ 9.452999] r10:000000d7 r9:80afa780 r8:80b087e4 r7:80afa778
> r6:80afa7a0 r5:80afa7a0
> [ 9.460933] r4:80afa7b0
> [ 9.463494] [<80aa65b4>] (prepare_namespace) from [<80aa5efc>]
> (kernel_init_freeable+0x1a4/0x1f0)
> [ 9.472386] r6:80b8b000 r5:80b8b000 r4:00000007
> [ 9.477081] [<80aa5d58>] (kernel_init_freeable) from [<807c91b4>]
> (kernel_init+0x10/0xf4)
> [ 9.485262] r10:00000000 r9:00000000 r8:00000000 r7:00000000
> r6:00000000 r5:807c91a4
> [ 9.493196] r4:00000000
> [ 9.495758] [<807c91a4>] (kernel_init) from [<800104b8>]
> (ret_from_fork+0x14/0x3c)
> [ 9.503349] r4:00000000 r3:00000000
> [ 9.507115] UBIFS error (ubi0:0 pid 1): ubifs_recover_master_node:
> failed to recover master node
> ^^^ master node recovery fails
> ...
> [ 9.618335] Kernel panic - not syncing: VFS: Unable to mount root
> fs on unknown-block(0,0)
>
> After added Huang Shijie's 'mtd: gpmi: fix the bitflips for erased
> page' patch from http://patchwork.ozlabs.org/patch/309763/ which shows
> the following on board2:
> [ 8.605463] gpmi-nand 112000.gpmi-nand: The page(186809) is an
> erased page(7,6,1,0).
> [ 8.628012] gpmi-nand 112000.gpmi-nand: The page(186809) is an
> erased page(7,6,1,0).
> ^^^ 1 bit-flip on ecc read, 0 bit-flip on non-ecc read, ecc-str=6,
> memset's buffer to 0xff to attempt to recover
> [ 8.637305] UBIFS error (ubi0:0 pid 1): ubifs_recover_master_node:
> failed to recover master node
> ^^^ unable to recover and always ends up this way over further boots
>
>>> If I understand what you are saying you are thinking that my issue is
>>> instead the result of a never-used PEB that had bit-flips from the
>>> manufacturer in which case the bits would read the same every time?
>>> How can we know this PEB was never before used and isn't one that was
>>> being erased/written during a power cut?
>>
>> I've seen bit flips on cheap SLC NANDs which came out of a sudden.
>> According to the FAE I was talking to this is legit for NAND
>> as long the flipping bits are fixable by the ECC engine.
>>
>
> I'm a little confused by the fact that gpmi-nand.c encounters an
> uncorrectable status yet when the bit-flips are counted by Huang's
> 'gmpi_erased_check' they come out to 1 which is less than the ecc str.
> Is this saying that ecc already corrected what it could and there was
> 'still' 1 more flip?
>
>>> In my test scenario where the rootfs is mounted from the kernel
>>> read-only, but later mounted read-write by userspace (yet not being
>>> specifically written to by userspace) then power-cut should 'any' NAND
>>> writes would be occurring at all? And if not as I suspect, then how
>>> could a subsequent boot end up using a PEB that may have been never
>>> previously used and have bit-flips from the manufacturer?
>>
>> UBIFS's has a wandering journal. During the remount it moved maybe.
>> But for a more expressive analysis I'd need a nanddump to find out which
>> blocks are in which role.
>> Can you share the nanddump?
>>
>
> This makes sense, and I apparently lied when I said the test software
> wasn't writing to the rootfs. There is at least one process that does
> write: the dhcp client when it writes to /etc/resolv.con and possibly
> something else writes as well. So we are power-cutting an an un-clean
> ubifs and never allowing a safe shutdown so I would expect the journal
> to be wandering around.
>
>>> Should we be doing an erase block on every NAND block during our board
>>> manufacturing process to avoid this?
>>
>> Sorry, I don't understand this sentence.
>> Do you mean a full erasure of the whole NAND?
>> If so, it would not help as the bit flips can come later.
>> (Without writing/erasing the block)
>> The root cause is that your NFC cannot correct bit flips on empty pages.
>>
>>> It sounds like this 'unexpected bit-flips on erased pages from the
>>> mfg' issue is a ticking time-bomb for people using ubi/ubifs NAND.
>>> Shouldn't the http://www.linux-mtd.infradead.org/doc/ubifs.html page
>>> be updated to refer to this known issue as well as the unstable bit
>>> issue?
>>
>> As I said the root cause is that some NFCs cannot correct bit flips on empty
>> pages.
>> Instead of putting warnings to ubifs.html I'd love to see a solution on the
>> said drivers or MTD core.
>>
>>> I can add some debugging to find out - what specifically would be
>>> helpful to add?
>>
>> A hexdump of the buffer would be a good start.
>>
>>> Thanks for the help!
>>
>> Thanks for sharing your issues. This is the only way
>> to address them.
>> That said, as far on no board I had access to I was able to reproduce the unstable bits
>> issue. It was always something else.
>>
>> Thanks,
>> //richard
>
> I can provide dumps of various blocks if needed or add more debugging
> on the reads after the corruption occurred. It takes me a day or two
> to re-create the issue if you want me to turn on more debugging during
> the writes.
>
> I've cc'd Huang, Elie, and Brian who were involved in the patch to
> detect bit-flips in gpmi-nand.c reads - perhaps they have some more
> ideas. I find it interesting that in one case that patch resolves the
> issue and in the other it does not.
>
> Regards,
>
> Tim
Cc: Huang Shijie <shijie.huang at arm.com>
Updated Huang's e-mail as the old one bounced
Tim
More information about the linux-mtd
mailing list