[PATCH V2] nvmet: move async event work off nvmet-wq
Chaitanya Kulkarni
chaitanyak at nvidia.com
Mon Mar 9 22:44:34 PDT 2026
On 2/25/26 20:30, Chaitanya Kulkarni wrote:
> For target nvmet_ctrl_free() flushes ctrl->async_event_work.
> If nvmet_ctrl_free() runs on nvmet-wq, the flush re-enters workqueue
> completion for the same worker:-
>
> A. Async event work queued on nvmet-wq (prior to disconnect):
> nvmet_execute_async_event()
> queue_work(nvmet_wq, &ctrl->async_event_work)
>
> nvmet_add_async_event()
> queue_work(nvmet_wq, &ctrl->async_event_work)
>
> B. Full pre-work chain (RDMA CM path):
> nvmet_rdma_cm_handler()
> nvmet_rdma_queue_disconnect()
> __nvmet_rdma_queue_disconnect()
> queue_work(nvmet_wq, &queue->release_work)
> process_one_work()
> lock((wq_completion)nvmet-wq) <--------- 1st
> nvmet_rdma_release_queue_work()
>
> C. Recursive path (same worker):
> nvmet_rdma_release_queue_work()
> nvmet_rdma_free_queue()
> nvmet_sq_destroy()
> nvmet_ctrl_put()
> nvmet_ctrl_free()
> flush_work(&ctrl->async_event_work)
> __flush_work()
> touch_wq_lockdep_map()
> lock((wq_completion)nvmet-wq) <--------- 2nd
>
> Lockdep splat:
>
> ============================================
> WARNING: possible recursive locking detected
> 6.19.0-rc3nvme+ #14 Tainted: G N
> --------------------------------------------
> kworker/u192:42/44933 is trying to acquire lock:
> ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x26/0x90
>
> but task is already holding lock:
> ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x53e/0x660
>
> 3 locks held by kworker/u192:42/44933:
> #0: ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x53e/0x660
> #1: ffffc9000e6cbe28 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x1c5/0x660
> #2: ffffffff82d4db60 (rcu_read_lock){....}-{1:3}, at: __flush_work+0x62/0x530
>
> Workqueue: nvmet-wq nvmet_rdma_release_queue_work [nvmet_rdma]
> Call Trace:
> __flush_work+0x268/0x530
> nvmet_ctrl_free+0x140/0x310 [nvmet]
> nvmet_cq_put+0x74/0x90 [nvmet]
> nvmet_rdma_free_queue+0x23/0xe0 [nvmet_rdma]
> nvmet_rdma_release_queue_work+0x19/0x50 [nvmet_rdma]
> process_one_work+0x206/0x660
> worker_thread+0x184/0x320
> kthread+0x10c/0x240
> ret_from_fork+0x319/0x390
>
> Move async event work to a dedicated nvmet-aen-wq to avoid reentrant
> flush on nvmet-wq.
>
> Signed-off-by: Chaitanya Kulkarni<kch at nvidia.com>
> ---
can we please merge this ?
-ck
More information about the Linux-nvme
mailing list