[PATCH] nvme: reject completions for requests that are not in flight

Chao S coshi036 at gmail.com
Mon May 25 13:27:29 PDT 2026


Hi,

Since posting this I reproduced a more severe manifestation of the same
bug and confirmed the patch handles it; sharing as extra justification.

The commit message covers the freed / never-dispatched case (the NULL
rq->mq_hctx dereference).  When the stale command id instead maps to a
tag that has already been *reused*, the driver completes an unrelated,
still-in-flight request -- a use-after-free.  Under fuzzing (a device
that replays and reorders completions) this did not show up as a clean
NULL deref but as cross-subsystem memory corruption: general protection
faults in mtree_range_walk(), unlink_anon_vmas() and the slub freelist,
in unrelated tasks (modprobe, systemd-udevd, ...).  The trigger was a
stale completion delivered for a request that a concurrent controller
reset had just freed.

To confirm the fix addresses this, I rebuilt the kernel with the patch
and re-ran the same workload for ~10h.  The guard now rejects the
offending completion instead of acting on it:

  nvme nvme0: resetting controller
  nvme nvme0: completion for request 0x1c0 not in flight
  nvme nvme0: invalid id 448 completed on queue 2

and no use-after-free / corruption recurred over the run.

The code is unchanged; I'm happy to fold this into the commit message
as a v2 if you'd prefer it spelled out there.

Thanks,
Chao

On Fri, May 22, 2026 at 11:30 AM Chao Shi <coshi036 at gmail.com> wrote:
>
> nvme_find_rq() resolves a device-supplied command id to a request with
> blk_mq_tag_to_rq(), which returns whatever request last used that tag -
> possibly one that is no longer in flight (freed, or never dispatched and
> thus with a NULL rq->mq_hctx).  Commit e7006de6c238 ("nvme: code
> command_id with a genctr for use-after-free validation") guards against
> this, but its generation counter is only 4 bits wide and can be matched
> by a malfunctioning or malicious device replaying command ids.  The
> driver then completes a request that is not outstanding, dereferencing a
> NULL rq->mq_hctx or double-completing a command:
>
>   Oops: general protection fault ... KASAN: null-ptr-deref
>   RIP: blk_mq_complete_request_remote+0xe5/0xa80 block/blk-mq.c:1319
>    nvme_handle_cqe drivers/nvme/host/pci.c:1418 [inline]
>    nvme_poll_cq drivers/nvme/host/pci.c:1449
>    nvme_irq drivers/nvme/host/pci.c:1463
>
> Require the request to be in flight before completing it.  The check uses
> the request state, so it also covers controllers with
> NVME_QUIRK_SKIP_CID_GEN.
>
> Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
>
> Acked-by: Sungwoo Kim <iam at sung-woo.kim>
> Acked-by: Dave Tian <daveti at purdue.edu>
> Acked-by: Weidong Zhu <weizhu at fiu.edu>
> Signed-off-by: Chao Shi <coshi036 at gmail.com>
> ---
>  drivers/nvme/host/nvme.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 9a5f28c5103c..3a525c1dc818 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -615,6 +615,17 @@ static inline struct request *nvme_find_rq(struct blk_mq_tags *tags,
>                         tag);
>                 return NULL;
>         }
> +       /*
> +        * blk_mq_tag_to_rq() returns whatever request last used this tag, which
> +        * may no longer be in flight if the device reports a bogus command id.
> +        * Completing it would deref a NULL rq->mq_hctx or double-complete a
> +        * command; the 4-bit genctr below only narrows the window.
> +        */
> +       if (unlikely(blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT)) {
> +               dev_err(nvme_req(rq)->ctrl->device,
> +                       "completion for request %#x not in flight\n", tag);
> +               return NULL;
> +       }
>         if (unlikely(nvme_genctr_mask(nvme_req(rq)->genctr) != genctr)) {
>                 dev_err(nvme_req(rq)->ctrl->device,
>                         "request %#x genctr mismatch (got %#x expected %#x)\n",
> --
> 2.43.0
>



More information about the Linux-nvme mailing list