[PATCH V2] nvmet: move async event work off nvmet-wq

Mon Mar 9 22:44:34 PDT 2026

On 2/25/26 20:30, Chaitanya Kulkarni wrote:
> For target nvmet_ctrl_free() flushes ctrl->async_event_work.
> If nvmet_ctrl_free() runs on nvmet-wq, the flush re-enters workqueue
> completion for the same worker:-
>
> A. Async event work queued on nvmet-wq (prior to disconnect):
>    nvmet_execute_async_event()
>       queue_work(nvmet_wq, &ctrl->async_event_work)
>
>    nvmet_add_async_event()
>       queue_work(nvmet_wq, &ctrl->async_event_work)
>
> B. Full pre-work chain (RDMA CM path):
>    nvmet_rdma_cm_handler()
>       nvmet_rdma_queue_disconnect()
>         __nvmet_rdma_queue_disconnect()
>           queue_work(nvmet_wq, &queue->release_work)
>             process_one_work()
>               lock((wq_completion)nvmet-wq)  <--------- 1st
>               nvmet_rdma_release_queue_work()
>
> C. Recursive path (same worker):
>    nvmet_rdma_release_queue_work()
>       nvmet_rdma_free_queue()
>         nvmet_sq_destroy()
>           nvmet_ctrl_put()
>             nvmet_ctrl_free()
>               flush_work(&ctrl->async_event_work)
>                 __flush_work()
>                   touch_wq_lockdep_map()
>                   lock((wq_completion)nvmet-wq) <--------- 2nd
>
> Lockdep splat:
>
>    ============================================
>    WARNING: possible recursive locking detected
>    6.19.0-rc3nvme+ #14 Tainted: G                 N
>    --------------------------------------------
>    kworker/u192:42/44933 is trying to acquire lock:
>    ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x26/0x90
>
>    but task is already holding lock:
>    ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x53e/0x660
>
>    3 locks held by kworker/u192:42/44933:
>     #0: ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x53e/0x660
>     #1: ffffc9000e6cbe28 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x1c5/0x660
>     #2: ffffffff82d4db60 (rcu_read_lock){....}-{1:3}, at: __flush_work+0x62/0x530
>
>    Workqueue: nvmet-wq nvmet_rdma_release_queue_work [nvmet_rdma]
>    Call Trace:
>     __flush_work+0x268/0x530
>     nvmet_ctrl_free+0x140/0x310 [nvmet]
>     nvmet_cq_put+0x74/0x90 [nvmet]
>     nvmet_rdma_free_queue+0x23/0xe0 [nvmet_rdma]
>     nvmet_rdma_release_queue_work+0x19/0x50 [nvmet_rdma]
>     process_one_work+0x206/0x660
>     worker_thread+0x184/0x320
>     kthread+0x10c/0x240
>     ret_from_fork+0x319/0x390
>
> Move async event work to a dedicated nvmet-aen-wq to avoid reentrant
> flush on nvmet-wq.
>
> Signed-off-by: Chaitanya Kulkarni<kch at nvidia.com>
> ---

can we please merge this ?

-ck