blktests failures with v5.19-rc1
Shinichiro Kawasaki
shinichiro.kawasaki at wdc.com
Fri Jun 10 05:25:17 PDT 2022
On Jun 10, 2022 / 09:32, Chaitanya Kulkarni wrote:
> Shinichiro,
>
> >
> >> #6: nvme/032: Failed at the first run after system reboot.
> >> Used QEMU NVME device as TEST_DEV.
> >>
>
> ofcourse we need to fix this issue but can you also
> try it with the real H/W ?
Hi Chaitanya, thank you for looking into the failures. I have just run the test
case nvme/032 with real NVME device and observed the exactly same symptom as
QEMU NVME device.
FYI, here I quote the kernel WARNING with the real HW.
[ 170.567809] run blktests nvme/032 at 2022-06-10 21:19:03
[ 175.771062] nvme nvme0: 8/0/0 default/read/poll queues
[ 176.017635] ======================================================
[ 176.017637] WARNING: possible circular locking dependency detected
[ 176.017639] 5.19.0-rc1+ #1 Not tainted
[ 176.017643] ------------------------------------------------------
[ 176.017645] check/1147 is trying to acquire lock:
[ 176.017651] ffff888107010cb8 (kn->active#256){++++}-{0:0}, at: kernfs_remove_by_name_ns+0x90/0xe0
[ 176.017683]
but task is already holding lock:
[ 176.017685] ffffffff859bbcc8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_stop_and_remove_bus_device_locked+0xe/0x30
[ 176.017701]
which lock already depends on the new lock.
[ 176.017702]
the existing dependency chain (in reverse order) is:
[ 176.017704]
-> #1 (pci_rescan_remove_lock){+.+.}-{3:3}:
[ 176.017712] __mutex_lock+0x14c/0x12b0
[ 176.017720] dev_rescan_store+0x93/0xd0
[ 176.017727] kernfs_fop_write_iter+0x351/0x520
[ 176.017732] new_sync_write+0x2cd/0x500
[ 176.017737] vfs_write+0x62c/0x980
[ 176.017742] ksys_write+0xe7/0x1a0
[ 176.017746] do_syscall_64+0x3a/0x80
[ 176.017755] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 176.017762]
-> #0 (kn->active#256){++++}-{0:0}:
[ 176.017770] __lock_acquire+0x2874/0x5510
[ 176.017778] lock_acquire+0x194/0x4f0
[ 176.017782] __kernfs_remove+0x6f2/0x910
[ 176.017786] kernfs_remove_by_name_ns+0x90/0xe0
[ 176.017791] remove_files+0x8c/0x1a0
[ 176.017797] sysfs_remove_group+0x77/0x150
[ 176.017802] sysfs_remove_groups+0x4f/0x90
[ 176.017807] device_remove_attrs+0x19e/0x240
[ 176.017812] device_del+0x492/0xb60
[ 176.017816] pci_remove_bus_device+0x12c/0x350
[ 176.017821] pci_stop_and_remove_bus_device_locked+0x1e/0x30
[ 176.017826] remove_store+0xab/0xc0
[ 176.017831] kernfs_fop_write_iter+0x351/0x520
[ 176.017836] new_sync_write+0x2cd/0x500
[ 176.017839] vfs_write+0x62c/0x980
[ 176.017844] ksys_write+0xe7/0x1a0
[ 176.017848] do_syscall_64+0x3a/0x80
[ 176.017854] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 176.017861]
other info that might help us debug this:
[ 176.017862] Possible unsafe locking scenario:
[ 176.017864] CPU0 CPU1
[ 176.017865] ---- ----
[ 176.017866] lock(pci_rescan_remove_lock);
[ 176.017870] lock(kn->active#256);
[ 176.017875] lock(pci_rescan_remove_lock);
[ 176.017879] lock(kn->active#256);
[ 176.017883]
*** DEADLOCK ***
[ 176.017884] 3 locks held by check/1147:
[ 176.017888] #0: ffff888119ad8460 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0xe7/0x1a0
[ 176.017902] #1: ffff8881241b1888 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x215/0x520
[ 176.017914] #2: ffffffff859bbcc8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_stop_and_remove_bus_device_locked+0xe/0x30
[ 176.017927]
stack backtrace:
[ 176.017929] CPU: 7 PID: 1147 Comm: check Not tainted 5.19.0-rc1+ #1
[ 176.017935] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.2 11/22/2019
[ 176.017938] Call Trace:
[ 176.017941] <TASK>
[ 176.017945] dump_stack_lvl+0x5b/0x73
[ 176.017953] check_noncircular+0x23e/0x2e0
[ 176.017959] ? lock_chain_count+0x20/0x20
[ 176.017966] ? print_circular_bug+0x1e0/0x1e0
[ 176.017973] ? mark_lock+0xee/0x1600
[ 176.017983] ? lockdep_lock+0xb8/0x1a0
[ 176.017989] ? call_rcu_zapped+0xb0/0xb0
[ 176.017997] __lock_acquire+0x2874/0x5510
[ 176.018012] ? lockdep_hardirqs_on_prepare+0x410/0x410
[ 176.018023] lock_acquire+0x194/0x4f0
[ 176.018029] ? kernfs_remove_by_name_ns+0x90/0xe0
[ 176.018037] ? lock_downgrade+0x6b0/0x6b0
[ 176.018048] ? up_write+0x14d/0x460
[ 176.018055] __kernfs_remove+0x6f2/0x910
[ 176.018060] ? kernfs_remove_by_name_ns+0x90/0xe0
[ 176.018071] ? kernfs_next_descendant_post+0x280/0x280
[ 176.018078] ? lock_is_held_type+0xe3/0x140
[ 176.018085] ? kernfs_name_hash+0x16/0xc0
[ 176.018092] ? kernfs_find_ns+0x1e3/0x320
[ 176.018100] kernfs_remove_by_name_ns+0x90/0xe0
[ 176.018107] remove_files+0x8c/0x1a0
[ 176.018115] sysfs_remove_group+0x77/0x150
[ 176.018123] sysfs_remove_groups+0x4f/0x90
[ 176.018131] device_remove_attrs+0x19e/0x240
[ 176.018137] ? device_remove_file+0x20/0x20
[ 176.018146] device_del+0x492/0xb60
[ 176.018154] ? __device_link_del+0x350/0x350
[ 176.018159] ? kfree+0xc5/0x340
[ 176.018171] pci_remove_bus_device+0x12c/0x350
[ 176.018179] pci_stop_and_remove_bus_device_locked+0x1e/0x30
[ 176.018186] remove_store+0xab/0xc0
[ 176.018192] ? subordinate_bus_number_show+0xa0/0xa0
[ 176.018199] ? sysfs_kf_write+0x3d/0x170
[ 176.018207] kernfs_fop_write_iter+0x351/0x520
[ 176.018216] new_sync_write+0x2cd/0x500
[ 176.018223] ? new_sync_read+0x500/0x500
[ 176.018230] ? perf_callchain_user+0x810/0xa90
[ 176.018238] ? lock_downgrade+0x6b0/0x6b0
[ 176.018244] ? inode_security+0x54/0xf0
[ 176.018254] ? lock_is_held_type+0xe3/0x140
[ 176.018264] vfs_write+0x62c/0x980
[ 176.018273] ksys_write+0xe7/0x1a0
[ 176.018279] ? __ia32_sys_read+0xa0/0xa0
[ 176.018285] ? syscall_enter_from_user_mode+0x20/0x70
[ 176.018294] do_syscall_64+0x3a/0x80
[ 176.018303] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 176.018310] RIP: 0033:0x7fd59a901c17
[ 176.018317] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 176.018322] RSP: 002b:00007fffe29bbe98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 176.018328] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd59a901c17
[ 176.018333] RDX: 0000000000000002 RSI: 0000555e68a93c20 RDI: 0000000000000001
[ 176.018336] RBP: 0000555e68a93c20 R08: 0000000000000000 R09: 0000000000000073
[ 176.018339] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
[ 176.018342] R13: 00007fd59a9f8780 R14: 0000000000000002 R15: 00007fd59a9f39e0
[ 176.018356] </TASK>
[ 176.057966] pci 0000:04:00.0: [15b7:5002] type 00 class 0x010802
[ 176.058006] pci 0000:04:00.0: reg 0x10: [mem 0xfb600000-0xfb603fff 64bit]
[ 176.058059] pci 0000:04:00.0: reg 0x20: [mem 0xfb604000-0xfb6040ff 64bit]
[ 176.060129] pci 0000:04:00.0: BAR 0: assigned [mem 0xfb600000-0xfb603fff 64bit]
[ 176.060152] pci 0000:04:00.0: BAR 4: assigned [mem 0xfb604000-0xfb6040ff 64bit]
[ 176.061638] nvme nvme0: pci function 0000:04:00.0
[ 176.074114] nvme nvme0: 8/0/0 default/read/poll queues
--
Shin'ichiro Kawasaki
More information about the Linux-nvme
mailing list