[bug report] workqueue lockup under fio stress on NVMe drive

Fri May 5 03:40:43 PDT 2023

Hi all,

I could need some help with sorting out the following fall-out that
a recent addition to our test-suite seems to trigger consistently on
a range of recent upstream kernel-versions.
This hits with the "debug_defconfig" on ARCH=s390 - 
while "defconfig" runs clean. But then, CONFIG_WQ_WATCHDOG is
configured only in "debug_defconfig".

There might be some interaction between other config options in
"debug_defconfig", as the BUG does not show up with just the 
CONFIG_WQ_WATCHDOG added to "defconfig". 

I wonder if other ARCH's see similar issues with this kind of stress?
At any rate, I'm open to suggestions on how to attack this issue.

I'm also open to any advice on whether the combination of
test-case and addtional CONFIG options in "debug_defconfig" is
abusive...

For a clear base-line I just re-created this on a 6.3.0 kernel 
with the following "workload"

fio --readonly pci_nvme_read.fiojob 

with pci_nvme_read.fiojob specifying:

[iopstest]
filename=/dev/nvme0n1
rw=read
size=512m
blocksize=4k
iodepth=32
numjobs=14  (and higher up to $nproc=18)
time_based
runtime=1m
ramp_time=10s
ioengine=io_uring  (also with libaio)
direct=1
norandommap
group_reporting=1

I get the following messages logged:

2023/05/04 19:31:26 [t35lp67]: [  235.438022] BUG: workqueue lockup -
pool cpus=0 node=0 flags=0x0 nice=0 stuck for 65s!
2023/05/04 19:31:26 [t35lp67]: [  235.438265] BUG: workqueue lockup -
pool cpus=7 node=0 flags=0x0 nice=0 stuck for 61s!
2023/05/04 19:31:26 [t35lp67]: [  235.438278] BUG: workqueue lockup -
pool cpus=10 node=0 flags=0x0 nice=0 stuck for 46s!
2023/05/04 19:31:26 [t35lp67]: [  235.438290] BUG: workqueue lockup -
pool cpus=10 node=0 flags=0x0 nice=-20 stuck for 41s!
2023/05/04 19:31:26 [t35lp67]: [  235.438438] Showing busy workqueues
and worker pools:
2023/05/04 19:31:26 [t35lp67]: [  235.438621]   pwq 20: cpus=10 node=0
flags=0x0 nice=0 active=1/256 refcnt=2
2023/05/04 19:31:26 [t35lp67]: [  235.438624]     pending:
kfree_rcu_monitor
2023/05/04 19:31:26 [t35lp67]: [  235.438493] workqueue events:
flags=0x0
2023/05/04 19:31:26 [t35lp67]: [  235.438634]   pwq 0: cpus=0 node=0
flags=0x0 nice=0 active=2/256 refcnt=3
2023/05/04 19:31:26 [t35lp67]: [  235.438636]     pending:
vmstat_shepherd, kfree_rcu_monitor
2023/05/04 19:31:26 [t35lp67]: [  235.438736] workqueue events_unbound:
flags=0x2
2023/05/04 19:31:26 [t35lp67]: [  235.438752]   pwq 760: cpus=0-379
flags=0x4 nice=0 active=1/1520 refcnt=3
2023/05/04 19:31:26 [t35lp67]: [  235.438754]     in-flight:
588:toggle_allocation_gate
2023/05/04 19:31:26 [t35lp67]: [  235.438944] workqueue rcu_gp:
flags=0x8
2023/05/04 19:31:26 [t35lp67]: [  235.439069]   pwq 20: cpus=10 node=0
flags=0x0 nice=0 active=1/256 refcnt=2
2023/05/04 19:31:26 [t35lp67]: [  235.439072]     in-flight:
259:wait_rcu_exp_gp
2023/05/04 19:31:27 [t35lp67]: [  235.439125] workqueue rcu_par_gp:
flags=0x8
2023/05/04 19:31:27 [t35lp67]: [  235.439251]   pwq 20: cpus=10 node=0
flags=0x0 nice=0 active=2/256 refcnt=4
2023/05/04 19:31:27 [t35lp67]: [  235.439255]     pending:
sync_rcu_exp_select_node_cpus
2023/05/04 19:31:27 [t35lp67]: [  235.439413] workqueue mm_percpu_wq:
flags=0x8
2023/05/04 19:31:27 [t35lp67]: [  235.439253]     in-flight:
115:sync_rcu_exp_select_node_cpus BAR(259)
2023/05/04 19:31:27 [t35lp67]: [  235.439542]   pwq 14: cpus=7 node=0
flags=0x0 nice=0 active=1/256 refcnt=2
2023/05/04 19:31:27 [t35lp67]: [  235.439544]     pending:
lru_add_drain_per_cpu
2023/05/04 19:31:27 [t35lp67]: [  235.439550]   pwq 0: cpus=0 node=0
flags=0x0 nice=0 active=1/256 refcnt=3
2023/05/04 19:31:30 [t35lp67]: [  235.439552]     pending:
lru_add_drain_per_cpu BAR(140)
2023/05/04 19:31:30 [t35lp67]: [  235.439817] workqueue writeback:
flags=0x4a
2023/05/04 19:31:30 [t35lp67]: [  235.439969]   pwq 760: cpus=0-379
flags=0x4 nice=0 active=1/256 refcnt=3
2023/05/04 19:31:30 [t35lp67]: [  235.439971]     in-flight:
1345:wb_workfn
2023/05/04 19:31:30 [t35lp67]: [  235.440110] workqueue kblockd:
flags=0x18
2023/05/04 19:31:30 [t35lp67]: [  235.440238]   pwq 21: cpus=10 node=0
flags=0x0 nice=-20 active=1/256 refcnt=2
2023/05/04 19:31:30 [t35lp67]: [  235.440241]     pending:
blk_mq_run_work_fn
2023/05/04 19:31:30 [t35lp67]: [  235.441046] workqueue mld:
flags=0x40008
2023/05/04 19:31:30 [t35lp67]: [  235.441173]   pwq 20: cpus=10 node=0
flags=0x0 nice=0 active=1/1 refcnt=2
2023/05/04 19:31:30 [t35lp67]: [  235.441175]     pending: mld_ifc_work
2023/05/04 19:31:30 [t35lp67]: [  235.441611]   pwq 760: cpus=0-379
flags=0x4 nice=0 active=1/1 refcnt=4
2023/05/04 19:31:30 [t35lp67]: [  235.441603] workqueue mlx5e:
flags=0xe000a
2023/05/04 19:31:30 [t35lp67]: [  235.441613]     in-flight:
12:mlx5e_set_rx_mode_work [mlx5_core]
2023/05/04 19:31:30 [t35lp67]: [  235.441741]     inactive:
mlx5e_set_rx_mode_work [mlx5_core]
2023/05/04 19:31:30 [t35lp67]: [  235.443027] pool 20: cpus=10 node=0
flags=0x0 nice=0 hung=46s workers=3 idle: 69
2023/05/04 19:31:30 [t35lp67]: [  235.443339] pool 760: cpus=0-379
flags=0x4 nice=0 hung=0s workers=19 idle: 13 568 10 586 1987 567 587
1988 147 584 1321 1353 1986 1989 1990 1991

Thank you,
Gerd