[RFC PATCH] nvme-tcp: rerun io_work if req_list is not empty
Sagi Grimberg
sagi at grimberg.me
Mon May 17 17:42:18 PDT 2021
> A possible race condition exists where the request to send data is
> enqueued from nvme_tcp_handle_r2t()'s will not be observed by
> nvme_tcp_send_all() if it happens to be running. The driver relies on
> io_work to send the enqueued request when it is runs again, but the
> concurrently running nvme_tcp_send_all() may not have released the
> send_mutex at that time. If no future commands are enqueued to re-kick
> the io_work, the request will timeout in the SEND_H2C state, resulting
> in a timeout error like:
>
> nvme nvme0: queue 1: timeout request 0x3 type 6
>
> Ensure the io_work continues to run as long as the req_list is not
> empty.
There is a version of this patch that I personally suggested before,
however I couldn't explain why that should happen...
nvme_tcp_send_all tries to send everything it has queues, it means
should either be able to send everything, or it should see a full socket
buffer. But in case the socket buffer is full, there should be a
.write_space() sk callback triggering when the socket buffer evacuates
space... Maybe there is a chance that write_space triggered, started
execution, and that the send_mutex is still taken?
Can we maybe try to catch if that is the case?
>
> Signed-off-by: Keith Busch <kbusch at kernel.org>
> ---
> Marking this RFC because the timeout is difficult to recreate, so
> difficult to verify the patch. The patch was created purely from code
> inspection, so I'm just hoping for feedback on my analysis right now.
>
> drivers/nvme/host/tcp.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 0222e23f5936..d07eb13d8713 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1140,7 +1140,8 @@ static void nvme_tcp_io_work(struct work_struct *w)
> pending = true;
> else if (unlikely(result < 0))
> break;
> - }
> + } else
> + pending = !llist_empty(&queue->req_list);
In my version, pending was unconditionally set to true...
More information about the Linux-nvme
mailing list