UBI wl_tree_add problems after PEB scrubbed

Zach Sadecki zsadecki at itwatchdogs.com
Fri Nov 30 10:05:26 EST 2012


Every time I see UBI scrub a PEB with fixable bit-flips (on my custom 
Freescale i.MX28 board) the background thread has problems shortly 
thereafter.  I'm not exactly sure where to start debugging this and I'm 
hoping someone can help point me in the right direction.  Below are 
kernel messages showing the problem from 2 different runs (in which both 
ended up with a hung CPU).  This is using kernel 3.7-rc7.

Also worth noting is that I had to modify the gpmi-nand driver to 
actually report max_bitflips back to the MTD layer to even get to this 
point (before that everything would just run along happily until it hit 
an uncorrectable ECC error).  I will submit a patch for this once 
everything seems OK...

Kernel messages:
[18444.860000] UBI: fixable bit-flip detected at PEB 2789
[18444.870000] UBI: schedule PEB 2789 for scrubbing
[18444.960000] UBI: fixable bit-flip detected at PEB 2789
[18445.120000] UBI: scrubbed PEB 2789 (LEB 3:544), data moved to PEB 1574
[18457.040000] UBI assert failed in wl_tree_add at 204 (pid 31)
[18457.040000] [<c000d8fc>] (unwind_backtrace+0x0/0xf4) from 
[<c0279758>] (wl_tree_add+0xb8/0xd0)
[18457.050000] [<c0279758>] (wl_tree_add+0xb8/0xd0) from [<c027a008>] 
(erase_worker+0x3bc/0x6c0)
[18457.060000] [<c027a008>] (erase_worker+0x3bc/0x6c0) from [<c027961c>] 
(do_work+0x8c/0x110)
[18457.070000] [<c027961c>] (do_work+0x8c/0x110) from [<c027adf4>] 
(ubi_thread+0xc4/0x148)
[18457.080000] [<c027adf4>] (ubi_thread+0xc4/0x148) from [<c00370c8>] 
(kthread+0xa0/0xac)
[18457.090000] [<c00370c8>] (kthread+0xa0/0xac) from [<c00092e8>] 
(ret_from_fork+0x14/0x2c)
[21386.860000] UBI: fixable bit-flip detected at PEB 2467
[21386.870000] UBI: schedule PEB 2467 for scrubbing
[21387.000000] UBI: fixable bit-flip detected at PEB 2467
[21387.190000] UBI: scrubbed PEB 2467 (LEB 3:490), data moved to PEB 1574
[21400.920000] UBI assert failed in wl_tree_add at 204 (pid 31)
[21400.920000] [<c000d8fc>] (unwind_backtrace+0x0/0xf4) from 
[<c0279758>] (wl_tree_add+0xb8/0xd0)
[21400.930000] [<c0279758>] (wl_tree_add+0xb8/0xd0) from [<c027a008>] 
(erase_worker+0x3bc/0x6c0)
[21400.940000] [<c027a008>] (erase_worker+0x3bc/0x6c0) from [<c027961c>] 
(do_work+0x8c/0x110)
[21400.950000] [<c027961c>] (do_work+0x8c/0x110) from [<c027adf4>] 
(ubi_thread+0xc4/0x148)
[21400.960000] [<c027adf4>] (ubi_thread+0xc4/0x148) from [<c00370c8>] 
(kthread+0xa0/0xac)
[21400.970000] [<c00370c8>] (kthread+0xa0/0xac) from [<c00092e8>] 
(ret_from_fork+0x14/0x2c)
[26168.070000] BUG: soft lockup - CPU#0 stuck for 22s! [ubi_bgt0d:31]
[26168.070000] irq event stamp: 27382734
[26168.070000] hardirqs last  enabled at (27382733): [<c0008e88>] 
__irq_svc+0x48/0x54
[26168.070000] hardirqs last disabled at (27382734): [<c0008e74>] 
__irq_svc+0x34/0x54
[26168.070000] softirqs last  enabled at (27382732): [<c001ea48>] 
__do_softirq+0x1ac/0x22c
[26168.070000] softirqs last disabled at (27382725): [<c001ebb0>] 
irq_exit+0x84/0x94
[26168.070000]
[26168.070000] Pid: 31, comm:            ubi_bgt0d
[26168.070000] CPU: 0    Not tainted  (3.7.0-rc7-itw #15)
[26168.070000] PC is at wl_tree_add+0x44/0xd0
[26168.070000] LR is at erase_worker+0x3bc/0x6c0
[26168.070000] pc : [<c02796e4>]    lr : [<c027a008>] psr: 80000013
[26168.070000] sp : c76d9e90  ip : c08ea20c  fp : c763add8
[26168.070000] r10: 00000000  r9 : 00000043  r8 : c763ad78
[26168.070000] r7 : c76d8000  r6 : c77073c0  r5 : c7707410  r4 : c7707408
[26168.070000] r3 : 00000046  r2 : 00000044  r1 : c7707348  r0 : c77073c0
[26168.070000] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM 
Segment kernel
[26168.070000] Control: 0005317f  Table: 460a0000  DAC: 00000017
[26168.070000] [<c000d8fc>] (unwind_backtrace+0x0/0xf4) from 
[<c005fc94>] (watchdog_timer_fn+0x15c/0x1b0)
[26168.070000] [<c005fc94>] (watchdog_timer_fn+0x15c/0x1b0) from 
[<c003b24c>] (__run_hrtimer+0xd8/0x1e8)
[26168.070000] [<c003b24c>] (__run_hrtimer+0xd8/0x1e8) from [<c003b724>] 
(hrtimer_interrupt+0x104/0x348)
[26168.070000] [<c003b724>] (hrtimer_interrupt+0x104/0x348) from 
[<c0013390>] (mxs_timer_interrupt+0x20/0x30)
[26168.070000] [<c0013390>] (mxs_timer_interrupt+0x20/0x30) from 
[<c00603e4>] (handle_irq_event_percpu+0x70/0x24c)
[26168.070000] [<c00603e4>] (handle_irq_event_percpu+0x70/0x24c) from 
[<c00605fc>] (handle_irq_event+0x3c/0x5c)
[26168.070000] [<c00605fc>] (handle_irq_event+0x3c/0x5c) from 
[<c0062d70>] (handle_level_irq+0x90/0x120)
[26168.070000] [<c0062d70>] (handle_level_irq+0x90/0x120) from 
[<c0060360>] (generic_handle_irq+0x30/0x3c)
[26168.070000] [<c0060360>] (generic_handle_irq+0x30/0x3c) from 
[<c0009abc>] (handle_IRQ+0x30/0x84)
[26168.070000] [<c0009abc>] (handle_IRQ+0x30/0x84) from [<c0008760>] 
(icoll_handle_irq+0x34/0x48)
[26168.070000] [<c0008760>] (icoll_handle_irq+0x34/0x48) from 
[<c0008e84>] (__irq_svc+0x44/0x54)
[26168.070000] Exception stack(0xc76d9e48 to 0xc76d9e90)
[26168.070000] 9e40:                   c77073c0 c7707348 00000044 
00000046 c7707408 c7707410
[26168.070000] 9e60: c77073c0 c76d8000 c763ad78 00000043 00000000 
c763add8 c08ea20c c76d9e90
[26168.070000] 9e80: c027a008 c02796e4 80000013 ffffffff
[26168.070000] [<c0008e84>] (__irq_svc+0x44/0x54) from [<c02796e4>] 
(wl_tree_add+0x44/0xd0)
[26168.070000] [<c02796e4>] (wl_tree_add+0x44/0xd0) from [<c027a008>] 
(erase_worker+0x3bc/0x6c0)
[26168.070000] [<c027a008>] (erase_worker+0x3bc/0x6c0) from [<c027961c>] 
(do_work+0x8c/0x110)
[26168.070000] [<c027961c>] (do_work+0x8c/0x110) from [<c027adf4>] 
(ubi_thread+0xc4/0x148)
[26168.070000] [<c027adf4>] (ubi_thread+0xc4/0x148) from [<c00370c8>] 
(kthread+0xa0/0xac)
[26168.070000] [<c00370c8>] (kthread+0xa0/0xac) from [<c00092e8>] 
(ret_from_fork+0x14/0x2c)



[13111.940000] UBI: fixable bit-flip detected at PEB 2822
[13111.940000] UBI: schedule PEB 2822 for scrubbing
[13112.030000] UBI: fixable bit-flip detected at PEB 2822
[13112.200000] UBI: scrubbed PEB 2822 (LEB 3:634), data moved to PEB 2766
[13124.260000] UBI assert failed in wl_tree_add at 204 (pid 31)
[13124.270000] [<c000d8fc>] (unwind_backtrace+0x0/0xf4) from 
[<c027974c>] (wl_tree_add+0xb8/0xd0)
[13124.280000] [<c027974c>] (wl_tree_add+0xb8/0xd0) from [<c0279ffc>] 
(erase_worker+0x3bc/0x6c0)
[13124.290000] [<c0279ffc>] (erase_worker+0x3bc/0x6c0) from [<c0279610>] 
(do_work+0x8c/0x110)
[13124.300000] [<c0279610>] (do_work+0x8c/0x110) from [<c027ade8>] 
(ubi_thread+0xc4/0x148)
[13124.300000] [<c027ade8>] (ubi_thread+0xc4/0x148) from [<c00370c8>] 
(kthread+0xa0/0xac)
[13124.310000] [<c00370c8>] (kthread+0xa0/0xac) from [<c00092e8>] 
(ret_from_fork+0x14/0x2c)
[13361.320000] Unable to handle kernel NULL pointer dereference at 
virtual address 00000000
[13361.330000] pgd = c5c64000
[13361.330000] [00000000] *pgd=45c4f831, *pte=00000000, *ppte=00000000
[13361.340000] Internal error: Oops: 817 [#1] ARM
[13361.340000] CPU: 0    Not tainted  (3.7.0-rc7-itw #12)
[13361.340000] PC is at prot_queue_del+0x7c/0xcc
[13361.340000] LR is at ubi_wl_put_peb+0x1d8/0x2d8
[13361.340000] pc : [<c0279070>]    lr : [<c027ac24>] psr: 60000013
[13361.340000] sp : c6f07cd0  ip : c7720228  fp : 00000265
[13361.340000] r10: 00000f75  r9 : 00000003  r8 : c763add8
[13361.340000] r7 : c763ad70  r6 : c763a000  r5 : c7720228  r4 : c763ad74
[13361.340000] r3 : 00200200  r2 : c77202b9  r1 : 00000000  r0 : c763a000
[13361.340000] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM 
Segment user
[13361.340000] Control: 0005317f  Table: 45c64000  DAC: 00000015
[13361.340000] Process sqlite3 (pid: 6567, stack limit = 0xc6f061b8)
[13361.340000] Stack: (0xc6f07cd0 to 0xc6f08000)
[13361.340000] 7cc0:                                     c763ad74 
c027ac24 00000000 c0056088
[13361.340000] 7ce0: 00000000 c763acc4 c76cfc00 00000000 c763a000 
00000003 00000f75 00000265
[13361.340000] 7d00: 00000000 c0275a94 00000000 c03cf0c8 c6f07dc0 
c6e65000 00000265 c6e65000
[13361.340000] 7d20: c6cd6000 c6f07dc0 00000800 c019b45c c6f07dc0 
00000000 c6f07dc0 00000000
[13361.340000] 7d40: c6e65000 c01a779c 00000000 00000000 00000000 
00000000 c6e65b20 00000001
[13361.340000] 7d60: c6cd6000 00000265 c6e652f4 c6e652f4 c6e65000 
c6cd6108 c6cd6000 00000000
[13361.340000] 7d80: 000001b5 c6e65000 00009000 c6cd6000 c6f07dc0 
00000800 00000000 c01a7e30
[13361.340000] 7da0: c01ab440 00000007 00000000 c6f06000 c6c76180 
c6f06000 c6cd6024 c6e65a80
[13361.340000] 7dc0: 00000000 0001f000 00000010 00000265 c5c09ca8 
c6e65bd0 c6cd612c ffffffe4
[13361.340000] 7de0: 000001b5 00000000 c6cd6108 00000000 c6cd612c 
c6e65000 00000001 c018df00
[13361.340000] 7e00: c725bb20 00000003 00000060 00000004 00000080 
c6f06000 00000188 c6e652b4
[13361.340000] 7e20: 00000000 00000000 00000108 00000001 c725d020 
c00bc1d4 00000002 00000000
[13361.340000] 7e40: c725d020 c6d82400 c6e65000 c72f3b20 c72f40c0 
c725d020 000000a0 c018ffd8
[13361.340000] 7e60: c6c76180 60000013 c72f3d38 c6f06000 00000188 
00000047 000000a0 00000048
[13361.340000] 7e80: 00000000 00000001 22222222 c72f3d5c 22222222 
22222222 22222222 22222222
[13361.340000] 7ea0: c725d020 00000000 c725d020 c72f3b20 c72f40a8 
c6e65000 c6f07edc 00000048
[13361.340000] 7ec0: 00000000 c0193ffc 00000001 00000000 00000001 
00000001 c00ccad0 00200020
[13361.340000] 7ee0: 00000000 00000000 00000000 00000278 22222222 
22222222 00000000 c725d020
[13361.340000] 7f00: 00000000 c72f40a8 c6f07f38 c72f3b20 c6f06000 
00000000 00000001 c00ccb04
[13361.340000] 7f20: c6f07f38 000093c4 00000000 c6f3a000 c72f40a8 
c00ce958 c740ee70 c715cc78
[13361.340000] 7f40: 0febbc64 0000000e c6f3a01d 00000000 00000000 
c7138d20 c725d020 00000000
[13361.340000] 7f60: 00000002 00000000 00000000 60000010 00000000 
00000000 00000001 c0056088
[13361.340000] 7f80: c6f06000 0051da2c 00000000 0051d418 0051da2c 
00000000 0051d418 0000000a
[13361.340000] 7fa0: c00093c4 c0009240 0051da2c 00000000 0051da2c 
0051da2c 00000000 b6f2576c
[13361.340000] 7fc0: 0051da2c 00000000 0051d418 0000000a b6f24b24 
00000000 00000000 00000001
[13361.340000] 7fe0: 00000000 bec32f28 b6ed9064 b6deb62c 60000010 
0051da2c 00000000 00000000
[13361.340000] [<c0279070>] (prot_queue_del+0x7c/0xcc) from [<c027ac24>] 
(ubi_wl_put_peb+0x1d8/0x2d8)
[13361.340000] [<c027ac24>] (ubi_wl_put_peb+0x1d8/0x2d8) from 
[<c0275a94>] (ubi_eba_unmap_leb+0x90/0xa8)
[13361.340000] [<c0275a94>] (ubi_eba_unmap_leb+0x90/0xa8) from 
[<c019b45c>] (ubifs_leb_unmap+0x78/0xf4)
[13361.340000] [<c019b45c>] (ubifs_leb_unmap+0x78/0xf4) from 
[<c01a779c>] (ubifs_garbage_collect_leb+0x23c/0x7b0)
[13361.340000] [<c01a779c>] (ubifs_garbage_collect_leb+0x23c/0x7b0) from 
[<c01a7e30>] (ubifs_garbage_collect+0x120/0x3b8)
[13361.340000] [<c01a7e30>] (ubifs_garbage_collect+0x120/0x3b8) from 
[<c018df00>] (make_reservation+0x110/0x428)
[13361.340000] [<c018df00>] (make_reservation+0x110/0x428) from 
[<c018ffd8>] (ubifs_jnl_update+0x120/0x640)
[13361.340000] [<c018ffd8>] (ubifs_jnl_update+0x120/0x640) from 
[<c0193ffc>] (ubifs_unlink+0x168/0x260)
[13361.340000] [<c0193ffc>] (ubifs_unlink+0x168/0x260) from [<c00ccb04>] 
(vfs_unlink+0x84/0xfc)
[13361.340000] [<c00ccb04>] (vfs_unlink+0x84/0xfc) from [<c00ce958>] 
(do_unlinkat+0xd4/0x160)
[13361.340000] [<c00ce958>] (do_unlinkat+0xd4/0x160) from [<c0009240>] 
(ret_fast_syscall+0x0/0x38)
[13361.340000] Code: e59c2000 e59c1004 e59f3040 e5821004 (e5812000)
[13361.740000] ---[ end trace 8fb6e4d7822cc150 ]---
[57160.070000] BUG: soft lockup - CPU#0 stuck for 22s! [init:1]
[57160.070000] irq event stamp: 1311500
[57160.070000] hardirqs last  enabled at (1311499): [<c00654d8>] 
rcu_sched_qs+0x8c/0xc8
[57160.070000] hardirqs last disabled at (1311500): [<c03d15ec>] 
_raw_spin_lock_irq+0x24/0x54
[57160.070000] softirqs last  enabled at (1311474): [<c001ea48>] 
__do_softirq+0x1ac/0x22c
[57160.070000] softirqs last disabled at (1311465): [<c001ebb0>] 
irq_exit+0x84/0x94
[57160.070000]
[57160.070000] Pid: 1, comm:                 init
[57160.070000] CPU: 0    Tainted: G      D       (3.7.0-rc7-itw #12)
[57160.070000] PC is at wait_consider_task+0x4/0xb14
[57160.070000] LR is at do_wait+0x104/0x200
[57160.070000] pc : [<c001ac78>]    lr : [<c001b88c>] psr: 20000013
[57160.070000] sp : c7439f0c  ip : 00000005  fp : c7439f78
[57160.070000] r10: c7438000  r9 : c7439f34  r8 : c7430000
[57160.070000] r7 : c7439f60  r6 : c74300dc  r5 : 00000000  r4 : c6c728a0
[57160.070000] r3 : c6c72984  r2 : c6c728a0  r1 : 00000000  r0 : c7439f60
[57160.070000] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM 
Segment user
[57160.070000] Control: 0005317f  Table: 4779c000  DAC: 00000015
[57160.070000] [<c000d8fc>] (unwind_backtrace+0x0/0xf4) from 
[<c005fc94>] (watchdog_timer_fn+0x15c/0x1b0)
[57160.070000] [<c005fc94>] (watchdog_timer_fn+0x15c/0x1b0) from 
[<c003b24c>] (__run_hrtimer+0xd8/0x1e8)
[57160.070000] [<c003b24c>] (__run_hrtimer+0xd8/0x1e8) from [<c003b724>] 
(hrtimer_interrupt+0x104/0x348)
[57160.070000] [<c003b724>] (hrtimer_interrupt+0x104/0x348) from 
[<c0013390>] (mxs_timer_interrupt+0x20/0x30)
[57160.070000] [<c0013390>] (mxs_timer_interrupt+0x20/0x30) from 
[<c00603e4>] (handle_irq_event_percpu+0x70/0x24c)
[57160.070000] [<c00603e4>] (handle_irq_event_percpu+0x70/0x24c) from 
[<c00605fc>] (handle_irq_event+0x3c/0x5c)
[57160.070000] [<c00605fc>] (handle_irq_event+0x3c/0x5c) from 
[<c0062d70>] (handle_level_irq+0x90/0x120)
[57160.070000] [<c0062d70>] (handle_level_irq+0x90/0x120) from 
[<c0060360>] (generic_handle_irq+0x30/0x3c)
[57160.070000] [<c0060360>] (generic_handle_irq+0x30/0x3c) from 
[<c0009abc>] (handle_IRQ+0x30/0x84)
[57160.070000] [<c0009abc>] (handle_IRQ+0x30/0x84) from [<c0008760>] 
(icoll_handle_irq+0x34/0x48)
[57160.070000] [<c0008760>] (icoll_handle_irq+0x34/0x48) from 
[<c0008e84>] (__irq_svc+0x44/0x54)
[57160.070000] Exception stack(0xc7439ec0 to 0xc7439f08)
[57160.070000] 9ec0: c7439f60 00000000 c6c728a0 c6c72984 c6c728a0 
00000000 c74300dc c7439f60
[57160.070000] 9ee0: c7430000 c7439f34 c7438000 c7439f78 00000005 
c7439f0c c001b88c c001ac78
[57160.070000] 9f00: 20000013 ffffffff
[57160.070000] [<c0008e84>] (__irq_svc+0x44/0x54) from [<c001ac78>] 
(wait_consider_task+0x4/0xb14)
[57160.070000] [<c001ac78>] (wait_consider_task+0x4/0xb14) from 
[<c7438000>] (0xc7438000)




More information about the linux-mtd mailing list