[PATCH] nvme-tcp: handle failing req immediately for REQ_FAILFAST_DRIVER

Daniel Wagner dwagner at suse.de
Thu Apr 7 06:38:06 PDT 2022


On Thu, Apr 07, 2022 at 04:04:44PM +0300, Sagi Grimberg wrote:
>
> > > > diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> > > > index 10fc45d95b86..f714d7ad38c0 100644
> > > > --- a/drivers/nvme/host/tcp.c
> > > > +++ b/drivers/nvme/host/tcp.c
> > > > @@ -1145,9 +1145,11 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
> > > >         if (ret == -EAGAIN) {
> > > >                 ret = 0;
> > > >         } else if (ret < 0) {
> > > > +		struct request *rq = blk_mq_rq_from_pdu(queue->request);
> > > >                 dev_err(queue->ctrl->ctrl.device,
> > > >                         "failed to send request %d\n", ret);
> > > > -		if (ret != -EPIPE && ret != -ECONNRESET)
> > > > +		if ((ret != -EPIPE && ret != -ECONNRESET) ||
> > > > +		    rq->cmd_flags & REQ_FAILFAST_DRIVER)
> > >
> > > I'm wandering if this will race with nvme_cancel_admin_tagset which
> > > can also complete the command... The whole assumption is that the
> > > cancel iterator will be running as the socket is shutdown by the peer
> > > and error_recovery was triggered.
> > >
> > > I think these shouldn't race because we shutdown the queue (and flush
> > > the io_work) and only then cancel. But I've seen races of that
> > > sort in the past...
> >
> > This particular request is issued by nvme_shutdown_ctrl() so we know
> > exactly where we are. But this might not be the case for any other
> > REQ_FAILFAST_DRIVER flagged request. So this might introduce the race
> > you are pointing out.
> >
> > The option is see is what Margin suggested in
> >
> > http://lists.infradead.org/pipermail/linux-nvme/2021-September/027867.html
>
> Don't know if I like the state check. Why this does not apply in a ctrl
> reset?

Understood. Would something like this work

      struct request *rq = blk_mq_rq_from_pdu(queue->request);
      bool queue_ready = test_bit(NVME_TCP_Q_LIVE, &queue->flags);

      dev_err(queue->ctrl->ctrl.device,
              "failed to send request %d\n", ret);
      if ((ret != -EPIPE && ret != -ECONNRESET) ||
          (!queue_ready && rq->cmd_flags & REQ_FAILFAST_DRIVER))
              nvme_tcp_fail_request(queue->request);

?



More information about the Linux-nvme mailing list