[RFC PATCH] nvme-tcp: rerun io_work if req_list is not empty
Keith Busch
kbusch at kernel.org
Mon May 17 18:38:04 PDT 2021
On Mon, May 17, 2021 at 05:42:18PM -0700, Sagi Grimberg wrote:
> > A possible race condition exists where the request to send data is
> > enqueued from nvme_tcp_handle_r2t()'s will not be observed by
> > nvme_tcp_send_all() if it happens to be running. The driver relies on
> > io_work to send the enqueued request when it is runs again, but the
> > concurrently running nvme_tcp_send_all() may not have released the
> > send_mutex at that time. If no future commands are enqueued to re-kick
> > the io_work, the request will timeout in the SEND_H2C state, resulting
> > in a timeout error like:
> >
> > nvme nvme0: queue 1: timeout request 0x3 type 6
> >
> > Ensure the io_work continues to run as long as the req_list is not
> > empty.
>
> There is a version of this patch that I personally suggested before,
> however I couldn't explain why that should happen...
>
> nvme_tcp_send_all tries to send everything it has queues, it means
> should either be able to send everything, or it should see a full socket
> buffer. But in case the socket buffer is full, there should be a
> .write_space() sk callback triggering when the socket buffer evacuates
> space... Maybe there is a chance that write_space triggered, started
> execution, and that the send_mutex is still taken?
nvme_tcp_send_all() breaks out of the loop if nvme_tcp_fetch_request()
returns NULL. If that happens just before io_work calls
nvme_tcp_handle_r2t() to enqueue the H2C request, nvme_tcp_send_all()
will not see that request, but send_mutex is still held. We're counting
on io_work to run again to handle sending the H2C data in that case.
Unlikely as it sounds, if the same nvme_tcp_send_all() context is still
holding the send_mutex when io_work gets back to trying to take it, how
will the data get sent?
> Can we maybe try to catch if that is the case?
Do you have a better idea on how we can catch this? I think there was
only one occurance of this sighting so far, and it looks like it took a
long time to encounter it, but we will try again if you have a proposal.
More information about the Linux-nvme
mailing list