[PATCH] nvme-pci: check for valid request when polling for completions

Tue Sep 3 08:14:27 PDT 2024

On Tue, Sep 03, 2024 at 08:25:08AM +0200, Hannes Reinecke wrote:
> On 9/2/24 19:04, Sagi Grimberg wrote:
> > On 02/09/2024 16:07, Hannes Reinecke wrote:
> > > When polling for completions from the timeout handler we traverse
> > > over _all_ cqes, and the fetching the request via blk_mq_tag_to_rq().
> > > Unfortunately that function will always return a request, even if
> > > that request is already completed.
> > > So we need to check if the command is still in flight before
> > > attempting to complete it.

So the very same command was completed in some other context? We've
disabled the queue's interrupt here, there should be no other context
that can concurrently complete it. The timeout poll check is supposed to
check only unseen cqes, not "all" of them. Is disable_irq() not a
sufficient barrier for accessing the cq head or something?