blktests failures with v6.15-rc1 kernel
Shinichiro Kawasaki
shinichiro.kawasaki at wdc.com
Wed Apr 9 21:41:20 PDT 2025
Hi all,
I ran the latest blktests (git hash: 9a9a365f389f) with the v6.15-rc1 kernel. I
observed 4 failures listed below. Comparing with the previous report with the
v6.14-rc1 kernel [1], 3 failures are no longer observed (block/002, nvme/058 and
zbd/009), and 2 new failures are observed (rxe driver hang, q_usage_counter WARN
during system boot).
[1] https://lore.kernel.org/linux-block/uyijd3ufbrfbiyyaajvhyhdyytssubekvymzgyiqjqmkh33cmi@ksqjpewsqlvw/
List of failures
================
#1: rxe driver test hangs (new: nvme rdma transport, rnbd, srp)
#2: nvme/037 (fc transport)
#3: nvme/041 (fc transport)
#4: q_usage_counter WARN during system boot (new)
Failure description
===================
#1: rxe driver test hangs (new: nvme rdma transport, rnbd, srp)
When use_rxe=1 is set in blktests config, blktests runs trigger null pointer
dereferences and hang at test cases which use rdma transport. The hang is
observed at nvme, rnbd and srp test groups.
Li Zhijian provided the fix patch (thanks!) [2], and it is already applied
for v6.15-rcX releases.
[2] https://lore.kernel.org/linux-rdma/20250402032657.1762800-1-lizhijian@fujitsu.com/
#2: nvme/037 (fc transport)
#3: nvme/041 (fc transport)
These two test cases fail for fc transport. Refer to the report for v6.12
kernel [3].
[3] https://lore.kernel.org/linux-nvme/6crydkodszx5vq4ieox3jjpwkxtu7mhbohypy24awlo5w7f4k6@to3dcng24rd4/
#4: q_usage_counter WARN during system boot (new)
To be precise, this is not a blktests failure, but I found it on my test
systems for blktests. During the system boot process, a lockdep WARN was
reported which is relevant to q_usage_counter [4]. I think this needs
further debug.
[4]
Apr 10 10:24:19 testnode2 kernel: ======================================================
Apr 10 10:24:19 testnode2 kernel: WARNING: possible circular locking dependency detected
Apr 10 10:24:19 testnode2 kernel: 6.15.0-rc1+ #22 Not tainted
Apr 10 10:24:19 testnode2 kernel: ------------------------------------------------------
Apr 10 10:24:19 testnode2 kernel: (udev-worker)/556 is trying to acquire lock:
Apr 10 10:24:19 testnode2 kernel: ffff8881241bb698 (&q->elevator_lock){+.+.}-{4:4}, at: elv_iosched_store+0x178/0x4d0
Apr 10 10:24:19 testnode2 kernel:
but task is already holding lock:
Apr 10 10:24:19 testnode2 kernel: ffff8881241bb168 (&q->q_usage_counter(io)#12){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0xe/0x20
Apr 10 10:24:19 testnode2 kernel:
which lock already depends on the new lock.
Apr 10 10:24:19 testnode2 kernel:
the existing dependency chain (in reverse order) is:
Apr 10 10:24:19 testnode2 kernel:
-> #2 (&q->q_usage_counter(io)#12){++++}-{0:0}:
Apr 10 10:24:19 testnode2 kernel: blk_alloc_queue+0x5bc/0x700
Apr 10 10:24:19 testnode2 kernel: blk_mq_alloc_queue+0x14e/0x230
Apr 10 10:24:19 testnode2 kernel: scsi_alloc_sdev+0x848/0xc70
Apr 10 10:24:19 testnode2 kernel: scsi_probe_and_add_lun+0x475/0xb60
Apr 10 10:24:19 testnode2 kernel: __scsi_scan_target+0x18d/0x3b0
Apr 10 10:24:19 testnode2 kernel: scsi_scan_channel+0xfa/0x1a0
Apr 10 10:24:19 testnode2 kernel: scsi_scan_host_selected+0x1f8/0x2b0
Apr 10 10:24:19 testnode2 kernel: scsi_scan_host+0x2e2/0x3a0
Apr 10 10:24:19 testnode2 kernel: sym2_probe.cold+0xd7f/0xddc [sym53c8xx]
Apr 10 10:24:19 testnode2 kernel: local_pci_probe+0xdb/0x180
Apr 10 10:24:19 testnode2 kernel: pci_device_probe+0x588/0x7d0
Apr 10 10:24:19 testnode2 kernel: really_probe+0x1e3/0x8a0
Apr 10 10:24:19 testnode2 kernel: __driver_probe_device+0x18c/0x370
Apr 10 10:24:19 testnode2 kernel: driver_probe_device+0x4a/0x120
Apr 10 10:24:19 testnode2 kernel: __driver_attach+0x190/0x4a0
Apr 10 10:24:19 testnode2 kernel: bus_for_each_dev+0x104/0x180
Apr 10 10:24:19 testnode2 kernel: bus_add_driver+0x29d/0x4d0
Apr 10 10:24:19 testnode2 kernel: driver_register+0x1a1/0x350
Apr 10 10:24:19 testnode2 kernel: vsock_loopback_send_pkt+0xd4/0x120 [vsock_loopback]
Apr 10 10:24:19 testnode2 kernel: do_one_initcall+0xd1/0x450
Apr 10 10:24:19 testnode2 kernel: do_init_module+0x238/0x740
Apr 10 10:24:19 testnode2 kernel: load_module+0x52a3/0x73b0
Apr 10 10:24:19 testnode2 kernel: init_module_from_file+0xe5/0x150
Apr 10 10:24:19 testnode2 kernel: idempotent_init_module+0x22d/0x770
Apr 10 10:24:19 testnode2 kernel: __x64_sys_finit_module+0xbe/0x140
Apr 10 10:24:19 testnode2 kernel: do_syscall_64+0x93/0x190
Apr 10 10:24:19 testnode2 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Apr 10 10:24:19 testnode2 kernel:
-> #1 (fs_reclaim){+.+.}-{0:0}:
Apr 10 10:24:19 testnode2 kernel: fs_reclaim_acquire+0xc5/0x100
Apr 10 10:24:19 testnode2 kernel: kmem_cache_alloc_noprof+0x4d/0x430
Apr 10 10:24:19 testnode2 kernel: __kernfs_new_node+0xcb/0x750
Apr 10 10:24:19 testnode2 kernel: kernfs_new_node+0xeb/0x1b0
Apr 10 10:24:19 testnode2 kernel: kernfs_create_dir_ns+0x27/0x140
Apr 10 10:24:19 testnode2 kernel: sysfs_create_dir_ns+0x12c/0x270
Apr 10 10:24:19 testnode2 kernel: kobject_add_internal+0x272/0x800
Apr 10 10:24:19 testnode2 kernel: kobject_add+0x131/0x1a0
Apr 10 10:24:19 testnode2 kernel: elv_register_queue+0xb7/0x220
Apr 10 10:24:19 testnode2 kernel: blk_register_queue+0x31a/0x470
Apr 10 10:24:19 testnode2 kernel: add_disk_fwnode+0x712/0x10a0
Apr 10 10:24:19 testnode2 kernel: sd_probe+0x94e/0xf30
Apr 10 10:24:19 testnode2 kernel: really_probe+0x1e3/0x8a0
Apr 10 10:24:19 testnode2 kernel: __driver_probe_device+0x18c/0x370
Apr 10 10:24:19 testnode2 kernel: driver_probe_device+0x4a/0x120
Apr 10 10:24:19 testnode2 kernel: __device_attach_driver+0x15e/0x270
Apr 10 10:24:19 testnode2 kernel: bus_for_each_drv+0x113/0x1a0
Apr 10 10:24:19 testnode2 kernel: __device_attach_async_helper+0x19a/0x240
Apr 10 10:24:19 testnode2 kernel: zram0: detected capacity change from 0 to 16777216
Apr 10 10:24:19 testnode2 kernel: async_run_entry_fn+0x96/0x4f0
Apr 10 10:24:19 testnode2 kernel: process_one_work+0x84f/0x1460
Apr 10 10:24:19 testnode2 kernel: worker_thread+0x5ef/0xfd0
Apr 10 10:24:19 testnode2 kernel: kthread+0x3b0/0x770
Apr 10 10:24:19 testnode2 kernel: ret_from_fork+0x30/0x70
Apr 10 10:24:19 testnode2 kernel: ret_from_fork_asm+0x1a/0x30
Apr 10 10:24:19 testnode2 kernel:
-> #0 (&q->elevator_lock){+.+.}-{4:4}:
Apr 10 10:24:19 testnode2 kernel: __lock_acquire+0x1405/0x2210
Apr 10 10:24:19 testnode2 kernel: lock_acquire+0x170/0x310
Apr 10 10:24:19 testnode2 kernel: __mutex_lock+0x1a9/0x1a40
Apr 10 10:24:19 testnode2 kernel: elv_iosched_store+0x178/0x4d0
Apr 10 10:24:19 testnode2 kernel: queue_attr_store+0x236/0x2f0
Apr 10 10:24:19 testnode2 kernel: kernfs_fop_write_iter+0x39f/0x5a0
Apr 10 10:24:19 testnode2 kernel: vfs_write+0x5f7/0xe90
Apr 10 10:24:19 testnode2 kernel: ksys_write+0xf5/0x1c0
Apr 10 10:24:19 testnode2 kernel: do_syscall_64+0x93/0x190
Apr 10 10:24:19 testnode2 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Apr 10 10:24:19 testnode2 kernel:
other info that might help us debug this:
Apr 10 10:24:19 testnode2 kernel: Chain exists of:
&q->elevator_lock --> fs_reclaim --> &q->q_usage_counter(io)#12
Apr 10 10:24:19 testnode2 kernel: Possible unsafe locking scenario:
Apr 10 10:24:19 testnode2 kernel: CPU0 CPU1
Apr 10 10:24:19 testnode2 kernel: ---- ----
Apr 10 10:24:19 testnode2 kernel: lock(&q->q_usage_counter(io)#12);
Apr 10 10:24:19 testnode2 kernel: lock(fs_reclaim);
Apr 10 10:24:19 testnode2 kernel: lock(&q->q_usage_counter(io)#12);
Apr 10 10:24:19 testnode2 kernel: lock(&q->elevator_lock);
Apr 10 10:24:19 testnode2 kernel:
*** DEADLOCK ***
Apr 10 10:24:19 testnode2 kernel: 5 locks held by (udev-worker)/556:
Apr 10 10:24:19 testnode2 kernel: #0: ffff888120de0420 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xf5/0x1c0
Apr 10 10:24:19 testnode2 kernel: #1: ffff888130a62c88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x25c/0x5a0
Apr 10 10:24:19 testnode2 kernel: #2: ffff88812d38ec38 (kn->active#72){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x27f/0x5a0
Apr 10 10:24:19 testnode2 kernel: #3: ffff8881241bb168 (&q->q_usage_counter(io)#12){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0xe/0x20
Apr 10 10:24:19 testnode2 kernel: #4: ffff8881241bb1a0 (&q->q_usage_counter(queue)#5){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0xe/0x20
Apr 10 10:24:19 testnode2 kernel:
stack backtrace:
Apr 10 10:24:19 testnode2 kernel: CPU: 0 UID: 0 PID: 556 Comm: (udev-worker) Not tainted 6.15.0-rc1+ #22 PREEMPT(voluntary)
Apr 10 10:24:19 testnode2 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
Apr 10 10:24:19 testnode2 kernel: Call Trace:
Apr 10 10:24:19 testnode2 kernel: <TASK>
Apr 10 10:24:19 testnode2 kernel: dump_stack_lvl+0x6a/0x90
Apr 10 10:24:19 testnode2 kernel: print_circular_bug.cold+0x178/0x1be
Apr 10 10:24:19 testnode2 kernel: check_noncircular+0x146/0x160
Apr 10 10:24:19 testnode2 kernel: __lock_acquire+0x1405/0x2210
Apr 10 10:24:19 testnode2 kernel: lock_acquire+0x170/0x310
Apr 10 10:24:19 testnode2 kernel: ? elv_iosched_store+0x178/0x4d0
Apr 10 10:24:19 testnode2 kernel: ? __pfx___might_resched+0x10/0x10
Apr 10 10:24:19 testnode2 kernel: __mutex_lock+0x1a9/0x1a40
Apr 10 10:24:19 testnode2 kernel: ? elv_iosched_store+0x178/0x4d0
Apr 10 10:24:19 testnode2 kernel: ? elv_iosched_store+0x178/0x4d0
Apr 10 10:24:19 testnode2 kernel: ? mark_held_locks+0x40/0x70
Apr 10 10:24:19 testnode2 kernel: ? _raw_spin_unlock_irqrestore+0x4c/0x60
Apr 10 10:24:19 testnode2 kernel: ? __pfx___mutex_lock+0x10/0x10
Apr 10 10:24:19 testnode2 kernel: ? _raw_spin_unlock_irqrestore+0x35/0x60
Apr 10 10:24:19 testnode2 kernel: ? blk_mq_freeze_queue_wait+0x15a/0x170
Apr 10 10:24:19 testnode2 kernel: ? __pfx_autoremove_wake_function+0x10/0x10
Apr 10 10:24:19 testnode2 kernel: ? elv_iosched_store+0x178/0x4d0
Apr 10 10:24:19 testnode2 kernel: elv_iosched_store+0x178/0x4d0
Apr 10 10:24:19 testnode2 kernel: ? lock_acquire+0x170/0x310
Apr 10 10:24:19 testnode2 kernel: ? __pfx_elv_iosched_store+0x10/0x10
Apr 10 10:24:19 testnode2 kernel: ? __pfx___might_resched+0x10/0x10
Apr 10 10:24:19 testnode2 kernel: ? __pfx_sysfs_kf_write+0x10/0x10
Apr 10 10:24:19 testnode2 kernel: queue_attr_store+0x236/0x2f0
Apr 10 10:24:19 testnode2 kernel: ? __pfx_queue_attr_store+0x10/0x10
Apr 10 10:24:19 testnode2 kernel: ? find_held_lock+0x2b/0x80
Apr 10 10:24:19 testnode2 kernel: ? sysfs_file_kobj+0xb3/0x1c0
Apr 10 10:24:19 testnode2 kernel: ? sysfs_file_kobj+0xb3/0x1c0
Apr 10 10:24:19 testnode2 kernel: ? lock_release+0x17d/0x2c0
Apr 10 10:24:19 testnode2 kernel: ? lock_is_held_type+0xd5/0x130
Apr 10 10:24:19 testnode2 kernel: ? __pfx_sysfs_kf_write+0x10/0x10
Apr 10 10:24:19 testnode2 kernel: ? sysfs_file_kobj+0xbd/0x1c0
Apr 10 10:24:19 testnode2 kernel: ? __pfx_sysfs_kf_write+0x10/0x10
Apr 10 10:24:19 testnode2 kernel: kernfs_fop_write_iter+0x39f/0x5a0
Apr 10 10:24:19 testnode2 kernel: vfs_write+0x5f7/0xe90
Apr 10 10:24:19 testnode2 kernel: ? seqcount_lockdep_reader_access.constprop.0+0x82/0x90
Apr 10 10:24:19 testnode2 kernel: ? __pfx_vfs_write+0x10/0x10
Apr 10 10:24:19 testnode2 kernel: ? __lock_acquire+0x45d/0x2210
Apr 10 10:24:19 testnode2 kernel: ? find_held_lock+0x2b/0x80
Apr 10 10:24:19 testnode2 kernel: ? ktime_get_coarse_real_ts64+0x3d/0xd0
Apr 10 10:24:19 testnode2 kernel: ? ktime_get_coarse_real_ts64+0x3d/0xd0
Apr 10 10:24:19 testnode2 kernel: ksys_write+0xf5/0x1c0
Apr 10 10:24:19 testnode2 kernel: ? __pfx_ksys_write+0x10/0x10
Apr 10 10:24:19 testnode2 kernel: ? ktime_get_coarse_real_ts64+0x3d/0xd0
Apr 10 10:24:19 testnode2 kernel: do_syscall_64+0x93/0x190
Apr 10 10:24:19 testnode2 kernel: ? __pfx___might_resched+0x10/0x10
Apr 10 10:24:19 testnode2 kernel: ? find_held_lock+0x2b/0x80
Apr 10 10:24:19 testnode2 kernel: ? __might_fault+0x99/0x120
Apr 10 10:24:19 testnode2 kernel: ? __might_fault+0x99/0x120
Apr 10 10:24:19 testnode2 kernel: ? lock_release+0x17d/0x2c0
Apr 10 10:24:19 testnode2 kernel: ? rcu_is_watching+0x11/0xb0
Apr 10 10:24:19 testnode2 kernel: ? __rseq_handle_notify_resume+0x3f7/0xc00
Apr 10 10:24:19 testnode2 kernel: ? __pfx___rseq_handle_notify_resume+0x10/0x10
Apr 10 10:24:19 testnode2 kernel: ? __pfx_blkcg_maybe_throttle_current+0x10/0x10
Apr 10 10:24:19 testnode2 kernel: ? __pfx___x64_sys_openat+0x10/0x10
Apr 10 10:24:19 testnode2 kernel: ? syscall_exit_to_user_mode+0x8e/0x280
Apr 10 10:24:19 testnode2 kernel: ? rcu_is_watching+0x11/0xb0
Apr 10 10:24:19 testnode2 kernel: ? do_syscall_64+0x9f/0x190
Apr 10 10:24:19 testnode2 kernel: ? lockdep_hardirqs_on+0x78/0x100
Apr 10 10:24:19 testnode2 kernel: ? do_syscall_64+0x9f/0x190
Apr 10 10:24:19 testnode2 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Apr 10 10:24:19 testnode2 kernel: RIP: 0033:0x7f8784cf4f44
Apr 10 10:24:19 testnode2 kernel: Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d 85 91 10 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
Apr 10 10:24:19 testnode2 kernel: RSP: 002b:00007fffc374d1a8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
Apr 10 10:24:19 testnode2 kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8784cf4f44
Apr 10 10:24:19 testnode2 kernel: RDX: 0000000000000003 RSI: 00007fffc374d4b0 RDI: 000000000000001b
Apr 10 10:24:19 testnode2 kernel: RBP: 00007fffc374d1d0 R08: 00007f8784df51c8 R09: 00007fffc374d280
Apr 10 10:24:19 testnode2 kernel: R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003
Apr 10 10:24:19 testnode2 kernel: R13: 00007fffc374d4b0 R14: 000055941522b580 R15: 00007f8784df4e80
Apr 10 10:24:19 testnode2 kernel: </TASK>
More information about the Linux-nvme
mailing list