[bug report] WARNING: CPU: 3 PID: 522 at block/genhd.c:144 bdev_count_inflight_rw+0x26e/0x410
Calvin Owens
calvin at wbinvd.org
Thu Jun 19 21:10:42 PDT 2025
On Tuesday 06/10 at 10:07 +0800, Yu Kuai wrote:
> So, this is blk-mq IO accounting, a different problem than nvme mpath.
>
> What kind of test you're running, can you reporduce ths problem? I don't
> have a clue yet after a quick code review.
>
> Thanks,
> Kuai
Hi all,
I've also been hitting this warning, I can reproduce it pretty
consistently within a few hours of running large Yocto builds. If I can
help test any patches, let me know.
A close approximation to what I'm doing is to clone Poky and build
core-image-weston: https://github.com/yoctoproject/poky
Using a higher than reasonable concurrency seems to help: I'm setting
BB_NUMBER_THREADS and PARALLEL_MAKE to 2x - 4x the number of CPUs. I'm
trying to narrow it down to a simpler reproducer, but haven't had any
luck yet.
I see this on three machines. One is btrfs/luks/nvme, the other two are
btrfs/luks/mdraid1/nvme*2. All three have a very large swapfile on the
rootfs. This is from the machine without mdraid:
------------[ cut here ]------------
WARNING: CPU: 6 PID: 1768274 at block/genhd.c:144 bdev_count_inflight_rw+0x8a/0xc0
CPU: 6 UID: 1000 PID: 1768274 Comm: cc1plus Not tainted 6.16.0-rc2-gcc-slubdebug-lockdep-00071-g74b4cc9b8780 #1 PREEMPT
Hardware name: Gigabyte Technology Co., Ltd. A620I AX/A620I AX, BIOS F3 07/10/2023
RIP: 0010:bdev_count_inflight_rw+0x8a/0xc0
Code: 00 01 d7 89 3e 49 8b 50 20 4a 03 14 d5 c0 4b 76 82 48 8b 92 90 00 00 00 01 d1 48 63 d0 89 4e 04 48 83 fa 1f 76 92 85 ff 79 a7 <0f> 0b c7 06 00 00 00 00 85 c9 79 9f 0f 0b c7 46 04 00 00 00 00 48
RSP: 0000:ffffc9002b027ab8 EFLAGS: 00010282
RAX: 0000000000000020 RBX: ffff88810dec0000 RCX: 000000000000000a
RDX: 0000000000000020 RSI: ffffc9002b027ac8 RDI: 00000000fffffffe
RBP: ffff88810dec0000 R08: ffff888100660b40 R09: ffffffffffffffff
R10: 000000000000001f R11: ffff888f3a30e9a8 R12: ffff8881098855d0
R13: ffffc9002b027b90 R14: 0000000000000001 R15: ffffc9002b027e18
FS: 00007fb394b48400(0000) GS:ffff888ccc9b9000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb3884a81c8 CR3: 00000013db708000 CR4: 0000000000350ef0
Call Trace:
<TASK>
bdev_count_inflight+0x16/0x30
update_io_ticks+0xb7/0xd0
blk_account_io_start+0xe8/0x200
blk_mq_submit_bio+0x34c/0x910
__submit_bio+0x95/0x5a0
? submit_bio_noacct_nocheck+0x169/0x400
submit_bio_noacct_nocheck+0x169/0x400
swapin_readahead+0x18a/0x550
? __filemap_get_folio+0x26/0x400
? get_swap_device+0xe8/0x210
? lock_release+0xc3/0x2a0
do_swap_page+0x1fa/0x1850
? __lock_acquire+0x46d/0x25c0
? wake_up_state+0x10/0x10
__handle_mm_fault+0x5e5/0x880
handle_mm_fault+0x70/0x2e0
exc_page_fault+0x374/0x8a0
asm_exc_page_fault+0x22/0x30
RIP: 0033:0x915570
Code: ff 01 0f 86 c4 05 00 00 41 56 41 55 41 54 55 48 89 fd 53 48 89 fb 0f 1f 40 00 48 89 df e8 98 c8 0b 00 84 c0 0f 85 90 05 00 00 <0f> b7 03 48 c1 e0 06 80 b8 99 24 d1 02 00 48 8d 90 80 24 d1 02 0f
RSP: 002b:00007ffc9327dfd0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00007fb3884a81c8 RCX: 0000000000000008
RDX: 0000000000000006 RSI: 0000000005dba008 RDI: 0000000000000000
RBP: 00007fb3884a81c8 R08: 000000000000000c R09: 00000007fb3884a8
R10: 0000000000000007 R11: 0000000000000206 R12: 0000000000000000
R13: 0000000000000002 R14: 00007ffc9329cb90 R15: 00007fb36e5d2700
</TASK>
irq event stamp: 36649373
hardirqs last enabled at (36649387): [<ffffffff813cea2d>] __up_console_sem+0x4d/0x50
hardirqs last disabled at (36649398): [<ffffffff813cea12>] __up_console_sem+0x32/0x50
softirqs last enabled at (36648786): [<ffffffff8136017f>] __irq_exit_rcu+0x8f/0xb0
softirqs last disabled at (36648617): [<ffffffff8136017f>] __irq_exit_rcu+0x8f/0xb0
---[ end trace 0000000000000000 ]---
I dumped all the similar WARNs I've seen here (blk-warn-%d.txt):
https://github.com/jcalvinowens/lkml-debug-616/tree/master
I don't have any evidence it's related, but I'm also hitting a rare OOPS
in futex with this same Yocto build workload. Sebastian has done some
analysis here:
https://lore.kernel.org/lkml/20250618160333.PdGB89yt@linutronix.de/
I get this warning most of the time I get the oops, but not all of the
time. Curious if anyone else is seeing the oops?
Thanks,
Calvin
More information about the Linux-nvme
mailing list