[PATCH] nvme-pci: fix potential I/O hang when CQ is full

Junnan Zhang zhangjn_dev at 163.com
Thu Feb 12 01:42:36 PST 2026


On Wed, 11 Feb 2026 05:27:50 -0700, Keith Busch wrote:
> On Wed, Feb 11, 2026 at 05:47:44PM +0800, Junnan Zhang wrote:
> > On Tue, 10 Feb 2026 16:57:12 +0100, Christoph Hellwig wrote:
> > 
> > > We can't update the CQ head before consuming the CQEs, otherwise
> > > the device can reuse them.  And devices must not discard completions
> > > when there is no completion queue entry, nvme does allow SQs and CQs
> > > to be smaller than the number of outstanding commands.
> > 
> > Updating the CQ head before consuming the CQE would not cause the device to 
> > reuse these entries, as new commands can only be submitted by the driver after
> > the CQE is consumed. Therefore, the device does not have the opportunity 
> > to reuse these entries.
> 
> That's just an artifact of how this host implementation constrains its
> tag space. It's not a reflection of how the NVMe protocol fundamentally
> works.
> 
> A full queue is not an error. It's a spec defined condition that the
> submitter just has to deal with. The protocol was specifically made to
> allow scenarios for dispatching more outstanding commands than the
> queues can hold. Your controller is broken.

Thank you very much. I understand your point. According to Section 3.3.1.2.1
Completion Queue Flow Control in the NVMe specification:

    If there are no free slots in a Completion Queue, then the controller 
    shall not post status to that Completion Queue until slots become 
    available. In this case, the controller may stop processing additional 
    submission queue entries associated with the affected Completion Queue 
    until slots become available. The controller shall continue processing 
    for other Submission Queues not associated with the affected Completion 
    Queue.

Thus, a full queue is not an error. It is a condition defined by the specification
that the submitter must handle accordingly.

In practice, SPDK vfio-user also addresses and resolves this issue. As referenced
in the following link:
    
	https://review.spdk.io/c/spdk/spdk/+/25473

During my testing involving repeated NVMe drive mounting and unmounting, I observed
the following:
1. Using the latest kernel version 6.19 + unmodified SPDK: issues occur.
2. Using the latest kernel version 6.19 + modified SPDK: no issues.
3. Using the latest kernel version 6.19 with an NVMe patch + unmodified SPDK: no issues.

Test Environment:
A virtual machine uses SPDK vfio-user to passthrough an NVMe drive. The VM has 64 vCPUs,
and the backend supports an NVMe I/O queue depth of at least 32 (Note: Since the admin 
queue depth is 32, the issue only reproduces when the queue depth is >=32). The issue 
occurs when repeatedly mounting and unmounting the drive on the host. Reproducing the 
issue typically requires about 10 cycles. Each cycle consists of the following steps:
1. virsh attach-device <VM> <disk.xml>
2. sleep 1.5
3. virsh detach-device <VM> <disk.xml>

Given the third observation - that using kernel 6.19 with an NVMe patch + unmodified SPDK
does not cause issues, so I was wondering if modifications to the NVMe driver are necessary.
Your expert guidance would be greatly appreciated.

Best regards,
Junnan Zhang




More information about the Linux-nvme mailing list