blktests failures with v5.19-rc1
Chaitanya Kulkarni
chaitanyak at nvidia.com
Fri Jun 10 02:22:25 PDT 2022
Thanks a lot for running tests...
> #3: nvme/002: Failed 3 times with 3 trials.
>
> runtime ... 85.697s
> --- tests/nvme/002.out 2022-06-02 10:18:53.538739780 +0900
> +++ /home/shin/kts/kernel-test-suite/src/blktests/results/nodev/nvme/002.out.bad 2022-06-09 09:51:45.714027241 +0900
> @@ -1,3003 +1,3006 @@
> Running nvme/002
> -Discovery Log Number of Records 1000, Generation counter X
> +Discovery Log Number of Records 1001, Generation counter X
> =====Discovery Log Entry 0======
> trtype: loop
> -subnqn: blktests-subsystem-0
> +subnqn: nqn.2014-08.org.nvmexpress.discovery
> ...
This needs a fix in the testcase, I'll send it shortly..
I guess similar fix has been posted not sure...
>
> #4: nvme/016: Failed 3 times with 3 trials.
>
> runtime 36.240s ... 35.984s
> --- tests/nvme/016.out 2022-06-02 10:18:53.541739780 +0900
> +++ /home/shin/kts/kernel-test-suite/src/blktests/results/nodev/nvme/016.out.bad 2022-06-09 10:06:11.797004672 +0900
> @@ -1,6 +1,9 @@
> Running nvme/016
> -Discovery Log Number of Records 1, Generation counter X
> +Discovery Log Number of Records 2, Generation counter X
> =====Discovery Log Entry 0======
> trtype: loop
> +subnqn: nqn.2014-08.org.nvmexpress.discovery
> +=====Discovery Log Entry 1======
> ...
>
same as above...
> #5: nvme/017: Failed 3 times with 3 trials.
>
> runtime 43.724s ... 42.912s
> --- tests/nvme/017.out 2022-06-02 10:18:53.541739780 +0900
> +++ /home/shin/kts/kernel-test-suite/src/blktests/results/nodev/nvme/017.out.bad 2022-06-09 10:09:18.016394996 +0900
> @@ -1,6 +1,9 @@
> Running nvme/017
> -Discovery Log Number of Records 1, Generation counter X
> +Discovery Log Number of Records 2, Generation counter X
> =====Discovery Log Entry 0======
> trtype: loop
> +subnqn: nqn.2014-08.org.nvmexpress.discovery
> +=====Discovery Log Entry 1======
> ...
>
same above ...
> #6: nvme/032: Failed at the first run after system reboot.
> Used QEMU NVME device as TEST_DEV.
>
> runtime ... 8.458s
> something found in dmesg:
> [ 1589.976481] run blktests nvme/032 at 2022-06-09 10:57:20
>
> [ 1597.221547] ======================================================
> [ 1597.221549] WARNING: possible circular locking dependency detected
> [ 1597.221551] 5.19.0-rc1 #1 Not tainted
> [ 1597.221554] ------------------------------------------------------
> [ 1597.221554] check/970 is trying to acquire lock:
> [ 1597.221556] ffff8881026f8cb8 (kn->active#227){++++}-{0:0}, at: kernfs_remove_by_name_ns+0x90/0xe0
> [ 1597.221580]
> but task is already holding lock:
> ...
>
> dmesg output:
now this is a real problem we need to look into ..
> [ 1589.976481] run blktests nvme/032 at 2022-06-09 10:57:20
>
> [ 1597.221547] ======================================================
> [ 1597.221549] WARNING: possible circular locking dependency detected
> [ 1597.221551] 5.19.0-rc1 #1 Not tainted
> [ 1597.221554] ------------------------------------------------------
> [ 1597.221554] check/970 is trying to acquire lock:
> [ 1597.221556] ffff8881026f8cb8 (kn->active#227){++++}-{0:0}, at: kernfs_remove_by_name_ns+0x90/0xe0
> [ 1597.221580]
> but task is already holding lock:
> [ 1597.221580] ffffffff889c0668 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_stop_and_remove_bus_device_locked+0xe/0x30
> [ 1597.221590]
> which lock already depends on the new lock.
>
> [ 1597.221591]
> the existing dependency chain (in reverse order) is:
> [ 1597.221592]
> -> #1 (pci_rescan_remove_lock){+.+.}-{3:3}:
> [ 1597.221598] __mutex_lock+0x14c/0x12b0
> [ 1597.221604] dev_rescan_store+0x94/0xd0
> [ 1597.221607] kernfs_fop_write_iter+0x34f/0x520
> [ 1597.221610] new_sync_write+0x2ca/0x500
> [ 1597.221614] vfs_write+0x62d/0x980
> [ 1597.221616] ksys_write+0xe7/0x1b0
> [ 1597.221619] do_syscall_64+0x37/0x80
> [ 1597.221622] entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [ 1597.221625]
> -> #0 (kn->active#227){++++}-{0:0}:
> [ 1597.221629] __lock_acquire+0x2875/0x5510
> [ 1597.221634] lock_acquire+0x194/0x4f0
> [ 1597.221636] __kernfs_remove+0x6f3/0x910
> [ 1597.221638] kernfs_remove_by_name_ns+0x90/0xe0
> [ 1597.221641] remove_files+0x8c/0x1a0
> [ 1597.221644] sysfs_remove_group+0x77/0x150
> [ 1597.221646] sysfs_remove_groups+0x4f/0x90
> [ 1597.221649] device_remove_attrs+0x19e/0x240
> [ 1597.221652] device_del+0x492/0xb60
> [ 1597.221654] pci_remove_bus_device+0x12c/0x350
> [ 1597.221656] pci_stop_and_remove_bus_device_locked+0x1e/0x30
> [ 1597.221659] remove_store+0xac/0xc0
> [ 1597.221662] kernfs_fop_write_iter+0x34f/0x520
> [ 1597.221664] new_sync_write+0x2ca/0x500
> [ 1597.221666] vfs_write+0x62d/0x980
> [ 1597.221669] ksys_write+0xe7/0x1b0
> [ 1597.221671] do_syscall_64+0x37/0x80
> [ 1597.221673] entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [ 1597.221675]
> other info that might help us debug this:
>
> [ 1597.221676] Possible unsafe locking scenario:
>
> [ 1597.221677] CPU0 CPU1
> [ 1597.221678] ---- ----
> [ 1597.221678] lock(pci_rescan_remove_lock);
> [ 1597.221680] lock(kn->active#227);
> [ 1597.221683] lock(pci_rescan_remove_lock);
> [ 1597.221685] lock(kn->active#227);
> [ 1597.221687]
> *** DEADLOCK ***
>
> [ 1597.221688] 3 locks held by check/970:
> [ 1597.221689] #0: ffff888110106460 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xe7/0x1b0
> [ 1597.221697] #1: ffff888120f69088 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x216/0x520
> [ 1597.221703] #2: ffffffff889c0668 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_stop_and_remove_bus_device_locked+0xe/0x30
> [ 1597.221709]
> stack backtrace:
> [ 1597.221714] CPU: 1 PID: 970 Comm: check Not tainted 5.19.0-rc1 #1
> [ 1597.221717] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> [ 1597.221723] Call Trace:
> [ 1597.221728] <TASK>
> [ 1597.221732] dump_stack_lvl+0x5b/0x74
> [ 1597.221742] check_noncircular+0x23f/0x2e0
> [ 1597.221745] ? lock_chain_count+0x20/0x20
> [ 1597.221748] ? print_circular_bug+0x1e0/0x1e0
> [ 1597.221751] ? mark_lock+0xee/0x1610
> [ 1597.221754] ? mark_lock+0xee/0x1610
> [ 1597.221759] ? lockdep_lock+0xb8/0x1a0
> [ 1597.221762] ? call_rcu_zapped+0xb0/0xb0
> [ 1597.221766] __lock_acquire+0x2875/0x5510
> [ 1597.221773] ? lockdep_hardirqs_on_prepare+0x410/0x410
> [ 1597.221779] lock_acquire+0x194/0x4f0
> [ 1597.221782] ? kernfs_remove_by_name_ns+0x90/0xe0
> [ 1597.221786] ? lock_downgrade+0x6b0/0x6b0
> [ 1597.221791] ? up_write+0x14d/0x460
> [ 1597.221795] __kernfs_remove+0x6f3/0x910
> [ 1597.221798] ? kernfs_remove_by_name_ns+0x90/0xe0
> [ 1597.221803] ? kernfs_next_descendant_post+0x280/0x280
> [ 1597.221807] ? lock_is_held_type+0xe3/0x140
> [ 1597.221811] ? kernfs_name_hash+0x16/0xc0
> [ 1597.221815] ? kernfs_find_ns+0x1e3/0x330
> [ 1597.221819] kernfs_remove_by_name_ns+0x90/0xe0
> [ 1597.221822] remove_files+0x8c/0x1a0
> [ 1597.221826] sysfs_remove_group+0x77/0x150
> [ 1597.221831] sysfs_remove_groups+0x4f/0x90
> [ 1597.221835] device_remove_attrs+0x19e/0x240
> [ 1597.221838] ? device_remove_file+0x20/0x20
> [ 1597.221842] device_del+0x492/0xb60
> [ 1597.221846] ? __device_link_del+0x350/0x350
> [ 1597.221848] ? kfree+0xc5/0x340
> [ 1597.221856] pci_remove_bus_device+0x12c/0x350
> [ 1597.221860] pci_stop_and_remove_bus_device_locked+0x1e/0x30
> [ 1597.221863] remove_store+0xac/0xc0
> [ 1597.221867] ? subordinate_bus_number_show+0xa0/0xa0
> [ 1597.221870] ? sysfs_kf_write+0x3d/0x170
> [ 1597.221874] kernfs_fop_write_iter+0x34f/0x520
> [ 1597.221881] new_sync_write+0x2ca/0x500
> [ 1597.221885] ? new_sync_read+0x500/0x500
> [ 1597.221888] ? perf_callchain_user+0x7c0/0xaa0
> [ 1597.221893] ? lock_downgrade+0x6b0/0x6b0
> [ 1597.221896] ? inode_security+0x54/0xf0
> [ 1597.221903] ? lock_is_held_type+0xe3/0x140
> [ 1597.221909] vfs_write+0x62d/0x980
> [ 1597.221913] ksys_write+0xe7/0x1b0
> [ 1597.221916] ? __ia32_sys_read+0xa0/0xa0
> [ 1597.221919] ? syscall_enter_from_user_mode+0x20/0x70
> [ 1597.221925] do_syscall_64+0x37/0x80
> [ 1597.221928] entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [ 1597.221931] RIP: 0033:0x7f5608901c17
> [ 1597.221939] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> [ 1597.221942] RSP: 002b:00007fff89480c08 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [ 1597.221945] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f5608901c17
> [ 1597.221948] RDX: 0000000000000002 RSI: 000055572288f080 RDI: 0000000000000001
> [ 1597.221949] RBP: 000055572288f080 R08: 0000000000000000 R09: 0000000000000073
> [ 1597.221951] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
> [ 1597.221953] R13: 00007f56089f8780 R14: 0000000000000002 R15: 00007f56089f39e0
> [ 1597.221960] </TASK>
> [ 1597.227490] pci 0000:00:09.0: [1b36:0010] type 00 class 0x010802
> [ 1597.230489] pci 0000:00:09.0: reg 0x10: [mem 0xfebc4000-0xfebc7fff 64bit]
> [ 1597.278136] pci 0000:00:09.0: BAR 0: assigned [mem 0x640000000-0x640003fff 64bit]
> [ 1597.283549] nvme nvme5: pci function 0000:00:09.0
> [ 1598.372141] nvme nvme5: 10/0/0 default/read/poll queues
> [ 1598.375349] nvme nvme5: Ignoring bogus Namespace Identifiers
> [ 1618.928153] run blktests nvme/032 at 2022-06-09 10:57:49
> [ 1625.557584] pci 0000:00:09.0: [1b36:0010] type 00 class 0x010802
> [ 1625.558348] pci 0000:00:09.0: reg 0x10: [mem 0x640000000-0x640003fff 64bit]
> [ 1625.577856] pci 0000:00:09.0: BAR 0: assigned [mem 0x640000000-0x640003fff 64bit]
> [ 1625.581168] nvme nvme5: pci function 0000:00:09.0
> [ 1626.695897] nvme nvme5: 10/0/0 default/read/poll queues
> [ 1626.701014] nvme nvme5: Ignoring bogus Namespace Identifiers
>
More information about the Linux-nvme
mailing list