[PATCH 3/3] nvme-tcp: fix I/O stalls on congested sockets
Hannes Reinecke
hare at suse.de
Sun Apr 13 23:21:59 PDT 2025
On 4/14/25 01:09, Sagi Grimberg wrote:
>
>
> On 03/04/2025 9:55, Hannes Reinecke wrote:
>> When the socket is busy processing nvme_tcp_try_recv() might
>> return -EAGAIN, but this doesn't automatically imply that
>> the sending side is blocked, too.
>> So check if there are pending requests once nvme_tcp_try_recv()
>> returns -EAGAIN and continue with the sending loop to avoid
>> I/O stalls.
>>
>> Acked-by: Chris Leech <cleech at redhat.com>
>> Signed-off-by: Hannes Reinecke <hare at kernel.org>
>> ---
>> drivers/nvme/host/tcp.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>> index 1a319cb86453..87f1d7a4ea06 100644
>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
>> @@ -1389,9 +1389,12 @@ static void nvme_tcp_io_work(struct work_struct
>> *w)
>> result = nvme_tcp_try_recv(queue);
>> if (result > 0)
>> pending = true;
>> - else if (unlikely(result < 0))
>> + else if (unlikely(result < 0) && result != -EAGAIN)
>> return;
>
> The way that the send path was done - is that EAGAIN returns 0 (success
> returns >0, failure returns <0)
> Perhaps we can make recv do the same?
>
I guess we can.
>> + if (nvme_tcp_queue_has_pending(queue))
>> + pending = true;
>> +
>
> Something is not clear to me, this suggest that try_send was not able to
> send data on the socket, shouldn't a .write_space() callback wake you when
> the socket send buffer gets some space? Why> do you immediately try
more even if you're sendmsg is returning EAGAIN?
>
But that's precisely the point: sendmsg() did _not_ return -EAGAIN.
recvmsg() did.
We just imply that nvme_tcp_try_send() will return -EAGAIN once
nvme_tcp_try_recv() did.
Which is wrong; -EAGAIN on reception can be due to a number of factors,
and does _not_ imply in any shape or form that sending will exhibit
the same error.
> This is specific to TLS I assume here?
Not necessarily; I've seen it with my performance testing on 10GigE
with normal TCP, too. TLS is just an easier way to reproduce it.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
More information about the Linux-nvme
mailing list