[PATCH 3/3] nvme-tcp: fix I/O stalls on congested sockets
Sagi Grimberg
sagi at grimberg.me
Tue Apr 15 14:35:20 PDT 2025
On 15/04/2025 10:07, Hannes Reinecke wrote:
> On 4/3/25 08:55, Hannes Reinecke wrote:
>> When the socket is busy processing nvme_tcp_try_recv() might
>> return -EAGAIN, but this doesn't automatically imply that
>> the sending side is blocked, too.
>> So check if there are pending requests once nvme_tcp_try_recv()
>> returns -EAGAIN and continue with the sending loop to avoid
>> I/O stalls.
>>
>> Acked-by: Chris Leech <cleech at redhat.com>
>> Signed-off-by: Hannes Reinecke <hare at kernel.org>
>> ---
>> drivers/nvme/host/tcp.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>> index 1a319cb86453..87f1d7a4ea06 100644
>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
>> @@ -1389,9 +1389,12 @@ static void nvme_tcp_io_work(struct
>> work_struct *w)
>> result = nvme_tcp_try_recv(queue);
>> if (result > 0)
>> pending = true;
>> - else if (unlikely(result < 0))
>> + else if (unlikely(result < 0) && result != -EAGAIN)
>> return;
>> + if (nvme_tcp_queue_has_pending(queue))
>> + pending = true;
>> +
>> if (!pending || !queue->rd_enabled)
>> return;
>
> The various 'try_send' function will return -EAGAIN for a partial send.
> But it doesn't indicate a blocked Tx, rather we should retry directly.
> Hence this check.
>
> Unless you tell me differently and even a partial send will cause
> ->write_space() to be invoked, then we wouldn't _need_ it.
Umm, that is my understanding. If you tried to send X and were able to
send Y where Y < X, you shouldn't have to keep trying in a busy loop,
the stack should
tell you when you can send again.
> It would
> still be an optimisation as we're saving the round-trip via socket
> callbacks.
But you are doing a busy loop on a socket that cannot accept new data,
there are other
sockets that the kthread can be working on.
>
> We could aim for a different error here, to differentiate between a
> 'real' EAGAIN and a partial send.
> Whatever you prefer.
I still don't understand why a partial send warrants a busy loop call to
sock_sendmsg...
My assumption is that the call right after the partial send, will see
EAGAIN error. But I may
be missing something here... I just never expected that a partial write
means that we must busy loop
sending to the socket.
What does a blocking sendmsg do under the hood? does it also follow this
practice?
More information about the Linux-nvme
mailing list