[PATCH 1/1] nvme: fix use-after-free of admin queue via stale pointer
Chaitanya Kulkarni
chaitanyak at nvidia.com
Thu Oct 30 01:12:53 PDT 2025
On 10/29/25 17:39, Keith Busch wrote:
> On Wed, Oct 29, 2025 at 03:08:53PM -0600, Casey Chen wrote:
>> Fix this by taking an additional reference on the admin queue during
>> namespace allocation and releasing it during namespace cleanup.
> Since the namespaces already hold references on the controller, would it
> be simpler to move the controller's final blk_put_queue to the final
> ctrl free? This should have the same lifetime as your patch, but with
> simpler ref counting:
>
> ---
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index fa4181d7de736..0b83d82f67e75 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4901,7 +4901,6 @@ void nvme_remove_admin_tag_set(struct nvme_ctrl *ctrl)
> */
> nvme_stop_keep_alive(ctrl);
> blk_mq_destroy_queue(ctrl->admin_q);
> - blk_put_queue(ctrl->admin_q);
> if (ctrl->ops->flags & NVME_F_FABRICS) {
> blk_mq_destroy_queue(ctrl->fabrics_q);
> blk_put_queue(ctrl->fabrics_q);
> @@ -5045,6 +5044,7 @@ static void nvme_free_ctrl(struct device *dev)
> container_of(dev, struct nvme_ctrl, ctrl_device);
> struct nvme_subsystem *subsys = ctrl->subsys;
>
> + blk_put_queue(ctrl->admin_q);
> if (!subsys || ctrl->instance != subsys->instance)
> ida_free(&nvme_instance_ida, ctrl->instance);
> nvme_free_cels(ctrl);
> --
>
above is much better approach that doesn't rely on taking extra
ref count but using existing count to protect the UAF.
I've added required comments that are very much needed here,
totally untested :-
nvme: fix UAF when accessing admin queue after removal
Fix a use-after-free where userspace IOCTLs can access ctrl->admin_q
after it has been freed during controller removal.
The Race Condition:
Thread 1 (userspace IOCTL) Thread 2 (sysfs remove)
-------------------------- -------------------
open(/dev/nvme0n1) -> fd=3
ioctl(3, NVME_IOCTL_ADMIN_CMD)
nvme_ioctl()
nvme_user_cmd()
echo 1 > .../remove
pci_device_remove()
nvme_remove()
nvme_remove_admin_tag_set()
blk_put_queue(admin_q)
[RCU grace period]
blk_free_queue(admin_q)
kmem_cache_free() <- FREED
nvme_submit_user_cmd(ns->ctrl->admin_q) <- STALE POINTER
blk_mq_alloc_request(admin_q)
blk_queue_enter(admin_q)
*** USE-AFTER-FREE ***
The admin queue is freed in nvme_remove_admin_tag_set() while userspace
may still hold open file descriptors to namespace devices. These open
file descriptors can issue IOCTLs that dereference ctrl->admin_q after
it has been freed.
Defer blk_put_queue(ctrl->admin_q) from nvme_remove_admin_tag_set() to
nvme_free_ctrl(). Since each namespace holds a controller reference via
nvme_get_ctrl()/nvme_put_ctrl(), the controller will only be freed after
all namespaces (and their open file descriptors) are released. This
guarantees admin_q remains allocated while it may still be accessed.
After blk_mq_destroy_queue() in nvme_remove_admin_tag_set(), the queue
is marked dying (QUEUE_FLAG_DYING), so new IOCTL attempts fail safely
at blk_queue_enter() with -ENODEV. The queue structure remains valid for
pointer dereference until nvme_free_ctrl() is called.
---
drivers/nvme/host/core.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 734ad725e6f4..dbbcf99dbef8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4897,7 +4897,19 @@ void nvme_remove_admin_tag_set(struct nvme_ctrl
*ctrl)
*/
nvme_stop_keep_alive(ctrl);
blk_mq_destroy_queue(ctrl->admin_q);
- blk_put_queue(ctrl->admin_q);
+ /**
+ * Defer blk_put_queue() to nvme_free_ctrl() to prevent use-after-free.
+ *
+ * Userspace may hold open file descriptors to namespace devices and
+ * issue IOCTLs that dereference ctrl->admin_q after controller removal
+ * starts. Since each namespace holds a controller reference, deferring
+ * the final queue release ensures admin_q remains allocated until all
+ * namespace references are released.
+ *
+ * blk_mq_destroy_queue() above marks the queue dying
(QUEUE_FLAG_DYING),
+ * causing new requests to fail at blk_queue_enter() with -ENODEV while
+ * keeping the structure valid for pointer access.
+ */
if (ctrl->ops->flags & NVME_F_FABRICS) {
blk_mq_destroy_queue(ctrl->fabrics_q);
blk_put_queue(ctrl->fabrics_q);
@@ -5041,6 +5053,14 @@ static void nvme_free_ctrl(struct device *dev)
container_of(dev, struct nvme_ctrl, ctrl_device);
struct nvme_subsystem *subsys = ctrl->subsys;
+ /**
+ * Release admin_q's final reference. All namespace references have
+ * been released at this point. NULL check is needed for to handle
+ * allocation failure in nvme_alloc_admin_tag_set().
+ */
+ if (ctrl->admin_q)
+ blk_put_queue(ctrl->admin_q);
+
if (!subsys || ctrl->instance != subsys->instance)
ida_free(&nvme_instance_ida, ctrl->instance);
nvme_free_cels(ctrl);
-ck
More information about the Linux-nvme
mailing list