[PATCH] nvme-tcp: handle failing req immediately for REQ_FAILFAST_DRIVER
Sagi Grimberg
sagi at grimberg.me
Thu Apr 7 05:04:42 PDT 2022
> When nvme_tcp_try_send fails to send a request is usually defers the
> error handling to upper layers but it ignores the REQ_FAILFAST_DRIVER
> flag.
>
> This can lead to very long shutdown delays when the users issues a
> 'nvme disconnect-all':
>
> [44888.710527] nvme nvme0: Removing ctrl: NQN "xxx"
> [44898.981684] nvme nvme0: failed to send request -32
> [44960.982977] nvme nvme0: queue 0: timeout request 0x18 type 4
> [44960.983099] nvme nvme0: Property Set error: 881, offset 0x14
>
> At timestamp [44898.981684] nvme_tcp_try_send fails to send but we have
> to wait for the timeout handler to expires at [44960.982977] before we
> make any progress.
>
> With honoring the REQ_FAILFAST_DRIVER flag the log looks like this:
>
> [ 8999.227495] nvme nvme1: Removing ctrl: NQN "xxx"
> [ 9016.680447] nvme nvme1: failed to send request -32
> [ 9016.680477] nvme nvme1: Property Set error: 880, offset 0x14
>
> Signed-off-by: Daniel Wagner <dwagner at suse.de>
> ---
> drivers/nvme/host/tcp.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 10fc45d95b86..f714d7ad38c0 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1145,9 +1145,11 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
> if (ret == -EAGAIN) {
> ret = 0;
> } else if (ret < 0) {
> + struct request *rq = blk_mq_rq_from_pdu(queue->request);
> dev_err(queue->ctrl->ctrl.device,
> "failed to send request %d\n", ret);
> - if (ret != -EPIPE && ret != -ECONNRESET)
> + if ((ret != -EPIPE && ret != -ECONNRESET) ||
> + rq->cmd_flags & REQ_FAILFAST_DRIVER)
I'm wandering if this will race with nvme_cancel_admin_tagset which
can also complete the command... The whole assumption is that the
cancel iterator will be running as the socket is shutdown by the peer
and error_recovery was triggered.
I think these shouldn't race because we shutdown the queue (and flush
the io_work) and only then cancel. But I've seen races of that
sort in the past...
More information about the Linux-nvme
mailing list