[PATCHv2] nvme-pci: fix timeout request state check

Keith Busch kbusch at kernel.org
Wed Jan 18 07:21:58 PST 2023


On Wed, Jan 18, 2023 at 08:33:30AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 17, 2023 at 10:52:39PM -0700, Keith Busch wrote:
> > We're actually not batching here (no IOB in the timeout context), so we
> > are either:
> > 
> >   a. calling nvme_pci_complete_rq() inline with the cqe
> >   b. racing with smp ipi or softirq
> > 
> > If case (a), we will always see IDLE. If (b), we are racing and may see
> > either COMPLETED or IDLE, so we have to check that it's not either of
> > those. Since there's only one other state (STARTED) that was guaranteed
> > prior to entering the timeout handler, we can just make sure it's not
> > that one after the poll to know if abort escalation is needed.
> 
> The point is still that "started" is the wrong check here and relies
> on an implementation detail.  I think we're better off with an explicit
> IDLE check and a big fat comment.

So you want the check to look like this instead?

---
@@ -1362,7 +1362,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
        else
                nvme_poll_irqdisable(nvmeq);

-       if (blk_mq_request_completed(req)) {
+       if (blk_mq_request_completed(req) ||
+           blk_mq_rq_state(req) == MQ_RQ_IDLE) {
                dev_warn(dev->ctrl.device,
                         "I/O %d QID %d timeout, completion polled\n",
                         req->tag, nvmeq->qid);
--

That's essentially a more complicated equivalent to what I have, but
fine with me if you think it's more clear.

Alternatively, I also considered moving the IDLE state setting to when
the request is actually freed, which might make more sense and works
without changing the nvme driver:

---
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -713,6 +713,7 @@ static void __blk_mq_free_request(struct request *rq)
        struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
        const int sched_tag = rq->internal_tag;

+       WRITE_ONCE(rq->state, MQ_RQ_IDLE);
        blk_crypto_free_request(rq);
        blk_pm_mark_last_busy(rq);
        rq->mq_hctx = NULL;
@@ -741,7 +742,6 @@ void blk_mq_free_request(struct request *rq)

        rq_qos_done(q, rq);

-       WRITE_ONCE(rq->state, MQ_RQ_IDLE);
        if (req_ref_put_and_test(rq))
                __blk_mq_free_request(rq);
 }
--



More information about the Linux-nvme mailing list