[PATCH 3/3] nvme-tcp: fix I/O stalls on congested sockets
Hannes Reinecke
hare at suse.de
Tue Apr 15 00:07:15 PDT 2025
On 4/3/25 08:55, Hannes Reinecke wrote:
> When the socket is busy processing nvme_tcp_try_recv() might
> return -EAGAIN, but this doesn't automatically imply that
> the sending side is blocked, too.
> So check if there are pending requests once nvme_tcp_try_recv()
> returns -EAGAIN and continue with the sending loop to avoid
> I/O stalls.
>
> Acked-by: Chris Leech <cleech at redhat.com>
> Signed-off-by: Hannes Reinecke <hare at kernel.org>
> ---
> drivers/nvme/host/tcp.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 1a319cb86453..87f1d7a4ea06 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1389,9 +1389,12 @@ static void nvme_tcp_io_work(struct work_struct *w)
> result = nvme_tcp_try_recv(queue);
> if (result > 0)
> pending = true;
> - else if (unlikely(result < 0))
> + else if (unlikely(result < 0) && result != -EAGAIN)
> return;
>
> + if (nvme_tcp_queue_has_pending(queue))
> + pending = true;
> +
> if (!pending || !queue->rd_enabled)
> return;
>
The various 'try_send' function will return -EAGAIN for a partial send.
But it doesn't indicate a blocked Tx, rather we should retry directly.
Hence this check.
Unless you tell me differently and even a partial send will cause
->write_space() to be invoked, then we wouldn't _need_ it. It would
still be an optimisation as we're saving the round-trip via socket
callbacks.
We could aim for a different error here, to differentiate between a
'real' EAGAIN and a partial send.
Whatever you prefer.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
More information about the Linux-nvme
mailing list