[PATCH v2] nvme-rdma: pin device wrapper in remove_one
Cen Zhang
zzzccc427 at gmail.com
Mon Jun 22 16:21:31 PDT 2026
nvme_rdma_remove_one() first verifies that an ib_device has an
nvme_rdma_device on device_list, but it drops device_list_mutex before it
walks nvme_rdma_ctrl_list. It then identifies matching controllers by
dereferencing ctrl->device->dev while holding only nvme_rdma_ctrl_mutex.
ctrl->device is a cached copy of queue 0's nvme_rdma_device. Queue
teardown owns that wrapper's kref, so controller list membership does not
keep the wrapper alive.
The buggy scenario involves two paths, with each column showing the order
within that path:
RDMA remove callback: Controller error recovery:
1. find ndev on device_list 1. run nvme_rdma_error_recovery_work()
2. drop device_list_mutex 2. tear down the admin queue
3. walk nvme_rdma_ctrl_list 3. drop the final queue device ref
4. read ctrl->device->dev 4. free the nvme_rdma_device wrapper
Fix this by taking a temporary reference to the matching nvme_rdma_device
while still holding device_list_mutex. The controller walk can then
compare ctrl->device directly with the pinned wrapper without
dereferencing a queue-owned object that might have been freed. Release the
temporary reference after the optional delete workqueue flush.
Validation reproduced this kernel report:
BUG: KASAN: slab-use-after-free in nvme_rdma_remove_one+0x281/0x2c0 [nvme_rdma]
Call Trace:
<TASK>
dump_stack_lvl+0x66/0xa0
print_report+0xce/0x630
? nvme_rdma_remove_one+0x281/0x2c0 [nvme_rdma]
? srso_alias_return_thunk+0x5/0xfbef5
? __virt_addr_valid+0x20d/0x410
? nvme_rdma_remove_one+0x281/0x2c0 [nvme_rdma]
kasan_report+0xe0/0x110
? nvme_rdma_remove_one+0x281/0x2c0 [nvme_rdma]
nvme_rdma_remove_one+0x281/0x2c0 [nvme_rdma]
remove_client_context+0xa9/0xf0 [ib_core]
disable_device+0x12d/0x240 [ib_core]
? __pfx_disable_device+0x10/0x10 [ib_core]
? srso_alias_return_thunk+0x5/0xfbef5
? __mutex_unlock_slowpath+0x147/0x900
__ib_unregister_device+0x26f/0x460 [ib_core]
ib_unregister_device_and_put+0x55/0x70 [ib_core]
nldev_dellink+0x29e/0x3c0 [ib_core]
? unwind_next_frame+0x6e3/0x2190
? __pfx_nldev_dellink+0x10/0x10 [ib_core]
? lock_acquire+0x2b8/0x2f0
? srso_alias_return_thunk+0x5/0xfbef5
? cap_capable+0x196/0x330
? __pfx_down_read+0x10/0x10
rdma_nl_rcv_msg+0x2db/0x5f0 [ib_core]
? __pfx_rdma_nl_rcv_msg+0x10/0x10 [ib_core]
rdma_nl_rcv_skb.constprop.0.isra.0+0x222/0x380 [ib_core]
? __pfx_rdma_nl_rcv_skb.constprop.0.isra.0+0x10/0x10 [ib_core]
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? netlink_deliver_tap+0x150/0xac0
netlink_unicast+0x47c/0x790
? __pfx_netlink_unicast+0x10/0x10
netlink_sendmsg+0x767/0xc30
? __pfx_netlink_sendmsg+0x10/0x10
? lock_release+0x1e0/0x280
__sys_sendto+0x339/0x390
? __pfx___sys_sendto+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
__x64_sys_sendto+0xe0/0x1c0
? do_syscall_64+0x81/0x6a0
? srso_alias_return_thunk+0x5/0xfbef5
? trace_hardirqs_on+0x18/0x160
do_syscall_64+0x115/0x6a0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Allocated by task 436:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
__kasan_kmalloc+0xaa/0xb0
nvme_rdma_cm_handler+0xcbc/0x2914 [nvme_rdma]
cma_cm_event_handler+0xb2/0x390 [rdma_cm]
addr_handler+0x199/0x2b0 [rdma_cm]
process_one_req+0x113/0x650 [ib_core]
process_one_work+0x8d0/0x1870
worker_thread+0x575/0xf80
kthread+0x2e7/0x3c0
ret_from_fork+0x576/0x810
ret_from_fork_asm+0x1a/0x30
Freed by task 436:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x5f/0x80
kfree+0x307/0x580
nvme_rdma_free_dev+0x16d/0x260 [nvme_rdma]
nvme_rdma_free_queue+0x6d/0x90 [nvme_rdma]
nvme_rdma_error_recovery_work+0x7f/0x110 [nvme_rdma]
process_one_work+0x8d0/0x1870
worker_thread+0x575/0xf80
kthread+0x2e7/0x3c0
ret_from_fork+0x576/0x810
ret_from_fork_asm+0x1a/0x30
Fixes: e87a911fed07 ("nvme-rdma: use ib_client API to detect device removal")
Assisted-by: Codex:gpt-5.5
Signed-off-by: Cen Zhang <zzzccc427 at gmail.com>
---
v2:
Reworked the fix to take a temporary nvme_rdma_device reference during the
device_list lookup instead of adding a cached ib_device field to struct
nvme_rdma_ctrl.
Changed the controller-list match to compare ctrl->device against the pinned
wrapper while preserving the existing delete workqueue flush behavior.
drivers/nvme/host/rdma.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 6909e3542794..9efe84b9d1d5 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -2372,31 +2372,32 @@ static struct nvmf_transport_ops nvme_rdma_transport = {
static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data)
{
struct nvme_rdma_ctrl *ctrl;
- struct nvme_rdma_device *ndev;
- bool found = false;
+ struct nvme_rdma_device *ndev = NULL;
+ struct nvme_rdma_device *tmp;
mutex_lock(&device_list_mutex);
- list_for_each_entry(ndev, &device_list, entry) {
- if (ndev->dev == ib_device) {
- found = true;
+ list_for_each_entry(tmp, &device_list, entry) {
+ if (tmp->dev == ib_device && nvme_rdma_dev_get(tmp)) {
+ ndev = tmp;
break;
}
}
mutex_unlock(&device_list_mutex);
- if (!found)
+ if (!ndev)
return;
/* Delete all controllers using this device */
mutex_lock(&nvme_rdma_ctrl_mutex);
list_for_each_entry(ctrl, &nvme_rdma_ctrl_list, list) {
- if (ctrl->device->dev != ib_device)
+ if (ctrl->device != ndev)
continue;
nvme_delete_ctrl(&ctrl->ctrl);
}
mutex_unlock(&nvme_rdma_ctrl_mutex);
flush_workqueue(nvme_delete_wq);
+ nvme_rdma_dev_put(ndev);
}
static struct ib_client nvme_rdma_ib_client = {
--
2.43.0
More information about the Linux-nvme
mailing list