[PATCH] nvme-rdma: avoid stale device wrapper in remove_one
Nilay Shroff
nilay at linux.ibm.com
Mon Jun 22 05:42:30 PDT 2026
On 6/21/26 7:29 PM, Cen Zhang wrote:
> nvme_rdma_remove_one() walks nvme_rdma_ctrl_list under
> nvme_rdma_ctrl_mutex, but it identifies matching controllers by reading
> ctrl->device->dev. The mutex only protects controller list membership.
> ctrl->device is a cached copy of queue 0's nvme_rdma_device, and that
> wrapper is refcounted by queue lifetime.
>
> The buggy scenario involves two paths, with each column showing the order
> within that path:
>
> RDMA remove callback: Controller error recovery:
> 1. enter nvme_rdma_remove_one() 1. run nvme_rdma_error_recovery_work()
> 2. walk nvme_rdma_ctrl_list 2. tear down the admin queue
> 3. read ctrl->device->dev 3. drop the final queue device ref
> 4. free the nvme_rdma_device wrapper
>
> Fix this by caching the ib_device identity in the controller when the
> admin queue is configured. The remove callback can then compare the
> cached pointer value against the ib_device being removed without
> dereferencing the queue-owned nvme_rdma_device wrapper. Keep the delete
> workqueue flush conditional on actually matching a controller.
>
I think, instead of caching ib_device in struct nvme_rdma_ctrl, we can take a
reference on the matching nvme_rdma_device (or ndev) while walking device_list.
As the reference is taken, the rdma device would not be freed until we put the
reference or while we loops through the ctrl list.
Then, loop through the ctrl list and compare ctrl->device directly against that
matching ndev. That would avoid dereferencing ctrl->device->dev while reusing
the existing kref-based lifetime management for nvme_rdma_device, rather than
introducing a separate cached ib_device pointer. At the end, we can drop the
reference to ndev.
Thanks,
--Nilay
More information about the Linux-nvme
mailing list