[PATCH 3/3] nvme-tcp: fix I/O stalls on congested sockets

Hannes Reinecke hare at suse.de
Tue Apr 15 00:07:15 PDT 2025


On 4/3/25 08:55, Hannes Reinecke wrote:
> When the socket is busy processing nvme_tcp_try_recv() might
> return -EAGAIN, but this doesn't automatically imply that
> the sending side is blocked, too.
> So check if there are pending requests once nvme_tcp_try_recv()
> returns -EAGAIN and continue with the sending loop to avoid
> I/O stalls.
> 
> Acked-by: Chris Leech <cleech at redhat.com>
> Signed-off-by: Hannes Reinecke <hare at kernel.org>
> ---
>   drivers/nvme/host/tcp.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 1a319cb86453..87f1d7a4ea06 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1389,9 +1389,12 @@ static void nvme_tcp_io_work(struct work_struct *w)
>   		result = nvme_tcp_try_recv(queue);
>   		if (result > 0)
>   			pending = true;
> -		else if (unlikely(result < 0))
> +		else if (unlikely(result < 0) && result != -EAGAIN)
>   			return;
>   
> +		if (nvme_tcp_queue_has_pending(queue))
> +			pending = true;
> +
>   		if (!pending || !queue->rd_enabled)
>   			return;
>   

The various 'try_send' function will return -EAGAIN for a partial send.
But it doesn't indicate a blocked Tx, rather we should retry directly.
Hence this check.

Unless you tell me differently and even a partial send will cause
->write_space() to be invoked, then we wouldn't _need_ it. It would
still be an optimisation as we're saving the round-trip via socket
callbacks.

We could aim for a different error here, to differentiate between a
'real' EAGAIN and a partial send.
Whatever you prefer.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



More information about the Linux-nvme mailing list