[PATCH] nvme-pci: check for valid request when polling for completions

Tue Sep 3 13:07:29 PDT 2024

On Tue, Sep 03, 2024 at 09:14:27AM -0600, Keith Busch wrote:
> On Tue, Sep 03, 2024 at 08:25:08AM +0200, Hannes Reinecke wrote:
> > On 9/2/24 19:04, Sagi Grimberg wrote:
> > > On 02/09/2024 16:07, Hannes Reinecke wrote:
> > > > When polling for completions from the timeout handler we traverse
> > > > over _all_ cqes, and the fetching the request via blk_mq_tag_to_rq().
> > > > Unfortunately that function will always return a request, even if
> > > > that request is already completed.
> > > > So we need to check if the command is still in flight before
> > > > attempting to complete it.
> 
> So the very same command was completed in some other context? We've
> disabled the queue's interrupt here, there should be no other context
> that can concurrently complete it. The timeout poll check is supposed to
> check only unseen cqes, not "all" of them. Is disable_irq() not a
> sufficient barrier for accessing the cq head or something?

Ooo, I think I see a problem. Does your device have more than one
namespace? I think we need to lock this queue for that condition because
the timeout work executes per-namespace, and we're not locking that
today. If you do have a muliti-namespace controller, does the below fix
your observation?

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6cd9395ba9ec3..2c73ccd21afe3 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1109,9 +1109,11 @@ static void nvme_poll_irqdisable(struct nvme_queue *nvmeq)
 
 	WARN_ON_ONCE(test_bit(NVMEQ_POLLED, &nvmeq->flags));
 
+	spin_lock(&nvmeq->cq_poll_lock);
 	disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
 	nvme_poll_cq(nvmeq, NULL);
 	enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
+	spin_unlock(&nvmeq->cq_poll_lock);
 }
 
 static int nvme_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)