[PATCH 3/3] nvme-tcp: fix I/O stalls on congested sockets

Sagi Grimberg sagi at grimberg.me
Tue Apr 15 14:35:20 PDT 2025



On 15/04/2025 10:07, Hannes Reinecke wrote:
> On 4/3/25 08:55, Hannes Reinecke wrote:
>> When the socket is busy processing nvme_tcp_try_recv() might
>> return -EAGAIN, but this doesn't automatically imply that
>> the sending side is blocked, too.
>> So check if there are pending requests once nvme_tcp_try_recv()
>> returns -EAGAIN and continue with the sending loop to avoid
>> I/O stalls.
>>
>> Acked-by: Chris Leech <cleech at redhat.com>
>> Signed-off-by: Hannes Reinecke <hare at kernel.org>
>> ---
>>   drivers/nvme/host/tcp.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>> index 1a319cb86453..87f1d7a4ea06 100644
>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
>> @@ -1389,9 +1389,12 @@ static void nvme_tcp_io_work(struct 
>> work_struct *w)
>>           result = nvme_tcp_try_recv(queue);
>>           if (result > 0)
>>               pending = true;
>> -        else if (unlikely(result < 0))
>> +        else if (unlikely(result < 0) && result != -EAGAIN)
>>               return;
>>   +        if (nvme_tcp_queue_has_pending(queue))
>> +            pending = true;
>> +
>>           if (!pending || !queue->rd_enabled)
>>               return;
>
> The various 'try_send' function will return -EAGAIN for a partial send.
> But it doesn't indicate a blocked Tx, rather we should retry directly.
> Hence this check.
>
> Unless you tell me differently and even a partial send will cause
> ->write_space() to be invoked, then we wouldn't _need_ it.

Umm, that is my understanding. If you tried to send X and were able to
send Y where Y < X, you shouldn't have to keep trying in a busy loop, 
the stack should
tell you when you can send again.

> It would
> still be an optimisation as we're saving the round-trip via socket
> callbacks.

But you are doing a busy loop on a socket that cannot accept new data, 
there are other
sockets that the kthread can be working on.

>
> We could aim for a different error here, to differentiate between a
> 'real' EAGAIN and a partial send.
> Whatever you prefer.

I still don't understand why a partial send warrants a busy loop call to 
sock_sendmsg...

My assumption is that the call right after the partial send, will see 
EAGAIN error. But I may
be missing something here... I just never expected that a partial write 
means that we must busy loop
sending to the socket.

What does a blocking sendmsg do under the hood? does it also follow this 
practice?



More information about the Linux-nvme mailing list