[PATCH V3] nvme-tcp: teardown circular lockng fixes

Chaitanya Kulkarni chaitanyak at nvidia.com
Fri Feb 27 15:12:45 PST 2026


On 2/25/26 18:56, Chaitanya Kulkarni wrote:
> When a controller reset is triggered via sysfs (by writing to
> /sys/class/nvme/<nvmedev>/reset_controller), the reset work tears down
> and re-establishes all queues. The socket release using fput() defers
> the actual cleanup to task_work delayed_fput workqueue. This deferred
> cleanup can race with the subsequent queue re-allocation during reset,
> potentially leading to use-after-free or resource conflicts.
>
> Replace fput() with __fput_sync() to ensure synchronous socket release,
> guaranteeing that all socket resources are fully cleaned up before the
> function returns. This prevents races during controller reset where
> new queue setup may begin before the old socket is fully released.
>
> * Call chain during reset:
>    nvme_reset_ctrl_work()
>      -> nvme_tcp_teardown_ctrl()
>        -> nvme_tcp_teardown_io_queues()
>          -> nvme_tcp_free_io_queues()
>            -> nvme_tcp_free_queue()       <-- fput() -> __fput_sync()
>        -> nvme_tcp_teardown_admin_queue()
>          -> nvme_tcp_free_admin_queue()
>            -> nvme_tcp_free_queue()       <-- fput() -> __fput_sync()
>      -> nvme_tcp_setup_ctrl()             <-- race with deferred fput
>
> memalloc_noreclaim_save() sets PF_MEMALLOC which is intended for tasks
> performing memory reclaim work that need reserve access. While PF_MEMALLOC
> prevents the task from entering direct reclaim (causing __need_reclaim() to
> return false), it does not strip __GFP_IO from gfp flags. The allocator can
> therefore still trigger writeback I/O when __GFP_IO remains set, which is
> unsafe when the caller holds block layer locks.
>
> Switch to memalloc_noio_save() which sets PF_MEMALLOC_NOIO. This causes
> current_gfp_context() to strip __GFP_IO|__GFP_FS from every allocation in
> the scope, making it safe to allocate memory while holding elevator_lock and
> set->srcu.
>
Can you please take a look when you get a chance ? See also [1].

-ck

[1]
https://lore.kernel.org/all/20251125005950.41046-1-ckulkarnilinux@gmail.com/
https://lore.kernel.org/all/20251125005950.41046-1-ckulkarnilinux@gmail.com/




More information about the Linux-nvme mailing list