[RFC PATCH] nvme-tcp: rerun io_work if req_list is not empty

Sagi Grimberg sagi at grimberg.me
Mon May 17 20:02:41 PDT 2021


>>> A possible race condition exists where the request to send data is
>>> enqueued from nvme_tcp_handle_r2t()'s will not be observed by
>>> nvme_tcp_send_all() if it happens to be running. The driver relies on
>>> io_work to send the enqueued request when it is runs again, but the
>>> concurrently running nvme_tcp_send_all() may not have released the
>>> send_mutex at that time. If no future commands are enqueued to re-kick
>>> the io_work, the request will timeout in the SEND_H2C state, resulting
>>> in a timeout error like:
>>>
>>>     nvme nvme0: queue 1: timeout request 0x3 type 6
>>>
>>> Ensure the io_work continues to run as long as the req_list is not
>>> empty.
>>
>> There is a version of this patch that I personally suggested before,
>> however I couldn't explain why that should happen...
>>
>> nvme_tcp_send_all tries to send everything it has queues, it means
>> should either be able to send everything, or it should see a full socket
>> buffer. But in case the socket buffer is full, there should be a
>> .write_space() sk callback triggering when the socket buffer evacuates
>> space... Maybe there is a chance that write_space triggered, started
>> execution, and that the send_mutex is still taken?
> 
> nvme_tcp_send_all() breaks out of the loop if nvme_tcp_fetch_request()
> returns NULL. If that happens just before io_work calls
> nvme_tcp_handle_r2t() to enqueue the H2C request, nvme_tcp_send_all()
> will not see that request, but send_mutex is still held. We're counting
> on io_work to run again to handle sending the H2C data in that case.
> Unlikely as it sounds, if the same nvme_tcp_send_all() context is still
> holding the send_mutex when io_work gets back to trying to take it, how
> will the data get sent?

Yes you are correct, overlooked this race. I guess this is all coming
from having less queues than cores (where rx really competes with tx)
which is not as common as a non default.

This is enough to convince me that this is needed:
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>


>> Can we maybe try to catch if that is the case?
> 
> Do you have a better idea on how we can catch this? I think there was
> only one occurance of this sighting so far, and it looks like it took a
> long time to encounter it, but we will try again if you have a proposal.

We can continue to test with the patch and hunt for another occurance,
given the argument above, this patch is needed regardless...



More information about the Linux-nvme mailing list