[PATCH] nvme-tcp: handle failing req immediately for REQ_FAILFAST_DRIVER

Thu Apr 7 05:04:42 PDT 2022

> When nvme_tcp_try_send fails to send a request is usually defers the
> error handling to upper layers but it ignores the REQ_FAILFAST_DRIVER
> flag.
> 
> This can lead to very long shutdown delays when the users issues a
> 'nvme disconnect-all':
> 
>   [44888.710527] nvme nvme0: Removing ctrl: NQN "xxx"
>   [44898.981684] nvme nvme0: failed to send request -32
>   [44960.982977] nvme nvme0: queue 0: timeout request 0x18 type 4
>   [44960.983099] nvme nvme0: Property Set error: 881, offset 0x14
> 
> At timestamp [44898.981684] nvme_tcp_try_send fails to send but we have
> to wait for the timeout handler to expires at [44960.982977] before we
> make any progress.
> 
> With honoring the REQ_FAILFAST_DRIVER flag the log looks like this:
> 
>   [ 8999.227495] nvme nvme1: Removing ctrl: NQN "xxx"
>   [ 9016.680447] nvme nvme1: failed to send request -32
>   [ 9016.680477] nvme nvme1: Property Set error: 880, offset 0x14
> 
> Signed-off-by: Daniel Wagner <dwagner at suse.de>
> ---
>   drivers/nvme/host/tcp.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 10fc45d95b86..f714d7ad38c0 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1145,9 +1145,11 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
>   	if (ret == -EAGAIN) {
>   		ret = 0;
>   	} else if (ret < 0) {
> +		struct request *rq = blk_mq_rq_from_pdu(queue->request);
>   		dev_err(queue->ctrl->ctrl.device,
>   			"failed to send request %d\n", ret);
> -		if (ret != -EPIPE && ret != -ECONNRESET)
> +		if ((ret != -EPIPE && ret != -ECONNRESET) ||
> +		    rq->cmd_flags & REQ_FAILFAST_DRIVER)

I'm wandering if this will race with nvme_cancel_admin_tagset which
can also complete the command... The whole assumption is that the
cancel iterator will be running as the socket is shutdown by the peer
and error_recovery was triggered.

I think these shouldn't race because we shutdown the queue (and flush
the io_work) and only then cancel. But I've seen races of that
sort in the past...