[PATCH] nvme-tcp: handle failing req immediately for REQ_FAILFAST_DRIVER

Mon Apr 11 04:22:17 PDT 2022

On Mon, Apr 11, 2022 at 12:58:09PM +0300, Sagi Grimberg wrote:
> > > I see there is a shutdown_timeout which is 5 seconds on default. Seems
> > > not to trigger though.
> > 
> > Obviously, the shutdown_timeout is not connected to the register
> > writes. So we would need something like:
> 
> Looking at this, its probably not what we want...

I agree. We just have to make sure that we fail these register
read/writes on transport layer and don't requeue them.

> I've inspected the code and I don't think that we should check
> the return code at all, we should simply fail the request. If
> the io_work context is running in parallel with the the error
> recovery handler we have a bug that we need to fix.
> 
> How about this?

Yes this would work fine for my case.

> --
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 10fc45d95b86..6d59867bb275 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1147,8 +1147,7 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue
> *queue)
>         } else if (ret < 0) {
>                 dev_err(queue->ctrl->ctrl.device,
>                         "failed to send request %d\n", ret);
> -               if (ret != -EPIPE && ret != -ECONNRESET)
> -                       nvme_tcp_fail_request(queue->request);
> +               nvme_tcp_fail_request(req);
>                 nvme_tcp_done_send_req(queue);
>         }
>         return ret;
> --
> 
> Adding Anton,
> 
> Anton, do you see an issue with this change? io_work is safely
> fenced from the cancellation iter so there is no reason for
> us not to fail the request here when the socket closes (or
> any other reason for that matter).