blktests failures with v5.19-rc1

Shinichiro Kawasaki shinichiro.kawasaki at wdc.com
Fri Jun 10 05:25:17 PDT 2022


On Jun 10, 2022 / 09:32, Chaitanya Kulkarni wrote:
> Shinichiro,
> 
> > 
> >> #6: nvme/032: Failed at the first run after system reboot.
> >>                 Used QEMU NVME device as TEST_DEV.
> >>
> 
> ofcourse we need to fix this issue but can you also
> try it with the real H/W ?

Hi Chaitanya, thank you for looking into the failures. I have just run the test
case nvme/032 with real NVME device and observed the exactly same symptom as
QEMU NVME device.

FYI, here I quote the kernel WARNING with the real HW.

[  170.567809] run blktests nvme/032 at 2022-06-10 21:19:03
[  175.771062] nvme nvme0: 8/0/0 default/read/poll queues

[  176.017635] ======================================================
[  176.017637] WARNING: possible circular locking dependency detected
[  176.017639] 5.19.0-rc1+ #1 Not tainted
[  176.017643] ------------------------------------------------------
[  176.017645] check/1147 is trying to acquire lock:
[  176.017651] ffff888107010cb8 (kn->active#256){++++}-{0:0}, at: kernfs_remove_by_name_ns+0x90/0xe0
[  176.017683] 
               but task is already holding lock:
[  176.017685] ffffffff859bbcc8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_stop_and_remove_bus_device_locked+0xe/0x30
[  176.017701] 
               which lock already depends on the new lock.

[  176.017702] 
               the existing dependency chain (in reverse order) is:
[  176.017704] 
               -> #1 (pci_rescan_remove_lock){+.+.}-{3:3}:
[  176.017712]        __mutex_lock+0x14c/0x12b0
[  176.017720]        dev_rescan_store+0x93/0xd0
[  176.017727]        kernfs_fop_write_iter+0x351/0x520
[  176.017732]        new_sync_write+0x2cd/0x500
[  176.017737]        vfs_write+0x62c/0x980
[  176.017742]        ksys_write+0xe7/0x1a0
[  176.017746]        do_syscall_64+0x3a/0x80
[  176.017755]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  176.017762] 
               -> #0 (kn->active#256){++++}-{0:0}:
[  176.017770]        __lock_acquire+0x2874/0x5510
[  176.017778]        lock_acquire+0x194/0x4f0
[  176.017782]        __kernfs_remove+0x6f2/0x910
[  176.017786]        kernfs_remove_by_name_ns+0x90/0xe0
[  176.017791]        remove_files+0x8c/0x1a0
[  176.017797]        sysfs_remove_group+0x77/0x150
[  176.017802]        sysfs_remove_groups+0x4f/0x90
[  176.017807]        device_remove_attrs+0x19e/0x240
[  176.017812]        device_del+0x492/0xb60
[  176.017816]        pci_remove_bus_device+0x12c/0x350
[  176.017821]        pci_stop_and_remove_bus_device_locked+0x1e/0x30
[  176.017826]        remove_store+0xab/0xc0
[  176.017831]        kernfs_fop_write_iter+0x351/0x520
[  176.017836]        new_sync_write+0x2cd/0x500
[  176.017839]        vfs_write+0x62c/0x980
[  176.017844]        ksys_write+0xe7/0x1a0
[  176.017848]        do_syscall_64+0x3a/0x80
[  176.017854]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  176.017861] 
               other info that might help us debug this:

[  176.017862]  Possible unsafe locking scenario:

[  176.017864]        CPU0                    CPU1
[  176.017865]        ----                    ----
[  176.017866]   lock(pci_rescan_remove_lock);
[  176.017870]                                lock(kn->active#256);
[  176.017875]                                lock(pci_rescan_remove_lock);
[  176.017879]   lock(kn->active#256);
[  176.017883] 
                *** DEADLOCK ***

[  176.017884] 3 locks held by check/1147:
[  176.017888]  #0: ffff888119ad8460 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0xe7/0x1a0
[  176.017902]  #1: ffff8881241b1888 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x215/0x520
[  176.017914]  #2: ffffffff859bbcc8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_stop_and_remove_bus_device_locked+0xe/0x30
[  176.017927] 
               stack backtrace:
[  176.017929] CPU: 7 PID: 1147 Comm: check Not tainted 5.19.0-rc1+ #1
[  176.017935] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.2 11/22/2019
[  176.017938] Call Trace:
[  176.017941]  <TASK>
[  176.017945]  dump_stack_lvl+0x5b/0x73
[  176.017953]  check_noncircular+0x23e/0x2e0
[  176.017959]  ? lock_chain_count+0x20/0x20
[  176.017966]  ? print_circular_bug+0x1e0/0x1e0
[  176.017973]  ? mark_lock+0xee/0x1600
[  176.017983]  ? lockdep_lock+0xb8/0x1a0
[  176.017989]  ? call_rcu_zapped+0xb0/0xb0
[  176.017997]  __lock_acquire+0x2874/0x5510
[  176.018012]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[  176.018023]  lock_acquire+0x194/0x4f0
[  176.018029]  ? kernfs_remove_by_name_ns+0x90/0xe0
[  176.018037]  ? lock_downgrade+0x6b0/0x6b0
[  176.018048]  ? up_write+0x14d/0x460
[  176.018055]  __kernfs_remove+0x6f2/0x910
[  176.018060]  ? kernfs_remove_by_name_ns+0x90/0xe0
[  176.018071]  ? kernfs_next_descendant_post+0x280/0x280
[  176.018078]  ? lock_is_held_type+0xe3/0x140
[  176.018085]  ? kernfs_name_hash+0x16/0xc0
[  176.018092]  ? kernfs_find_ns+0x1e3/0x320
[  176.018100]  kernfs_remove_by_name_ns+0x90/0xe0
[  176.018107]  remove_files+0x8c/0x1a0
[  176.018115]  sysfs_remove_group+0x77/0x150
[  176.018123]  sysfs_remove_groups+0x4f/0x90
[  176.018131]  device_remove_attrs+0x19e/0x240
[  176.018137]  ? device_remove_file+0x20/0x20
[  176.018146]  device_del+0x492/0xb60
[  176.018154]  ? __device_link_del+0x350/0x350
[  176.018159]  ? kfree+0xc5/0x340
[  176.018171]  pci_remove_bus_device+0x12c/0x350
[  176.018179]  pci_stop_and_remove_bus_device_locked+0x1e/0x30
[  176.018186]  remove_store+0xab/0xc0
[  176.018192]  ? subordinate_bus_number_show+0xa0/0xa0
[  176.018199]  ? sysfs_kf_write+0x3d/0x170
[  176.018207]  kernfs_fop_write_iter+0x351/0x520
[  176.018216]  new_sync_write+0x2cd/0x500
[  176.018223]  ? new_sync_read+0x500/0x500
[  176.018230]  ? perf_callchain_user+0x810/0xa90
[  176.018238]  ? lock_downgrade+0x6b0/0x6b0
[  176.018244]  ? inode_security+0x54/0xf0
[  176.018254]  ? lock_is_held_type+0xe3/0x140
[  176.018264]  vfs_write+0x62c/0x980
[  176.018273]  ksys_write+0xe7/0x1a0
[  176.018279]  ? __ia32_sys_read+0xa0/0xa0
[  176.018285]  ? syscall_enter_from_user_mode+0x20/0x70
[  176.018294]  do_syscall_64+0x3a/0x80
[  176.018303]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  176.018310] RIP: 0033:0x7fd59a901c17
[  176.018317] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  176.018322] RSP: 002b:00007fffe29bbe98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  176.018328] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd59a901c17
[  176.018333] RDX: 0000000000000002 RSI: 0000555e68a93c20 RDI: 0000000000000001
[  176.018336] RBP: 0000555e68a93c20 R08: 0000000000000000 R09: 0000000000000073
[  176.018339] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
[  176.018342] R13: 00007fd59a9f8780 R14: 0000000000000002 R15: 00007fd59a9f39e0
[  176.018356]  </TASK>
[  176.057966] pci 0000:04:00.0: [15b7:5002] type 00 class 0x010802
[  176.058006] pci 0000:04:00.0: reg 0x10: [mem 0xfb600000-0xfb603fff 64bit]
[  176.058059] pci 0000:04:00.0: reg 0x20: [mem 0xfb604000-0xfb6040ff 64bit]
[  176.060129] pci 0000:04:00.0: BAR 0: assigned [mem 0xfb600000-0xfb603fff 64bit]
[  176.060152] pci 0000:04:00.0: BAR 4: assigned [mem 0xfb604000-0xfb6040ff 64bit]
[  176.061638] nvme nvme0: pci function 0000:04:00.0
[  176.074114] nvme nvme0: 8/0/0 default/read/poll queues

-- 
Shin'ichiro Kawasaki


More information about the Linux-nvme mailing list