[PATCHv5 0/2] nvme-tcp: fixup I/O stall on congested sockets

Hannes Reinecke hare at kernel.org
Tue Apr 29 01:17:37 PDT 2025


Hi all,

I have been chasing keep-alive timeouts with TLS enabled in the last few
weeks (monthsA, even ...). On larger setups (eg with 32 queues) the connection
never got established properly as I've been hitting keep-alive timeouts before
the last queue got connected.
Turns out that occasionally we simply do not send the keep-alive request; it's
been added to the request list but the io_work workqueue function is never
restarted as it bails out after nvme_tcp_try_recv() returns -EAGAIN.
During debugging I also found that we're quite lazy with the list
handling of requests, so I've added a patche to ensure that all list
elements are properly terminated.

As usual, comments and reviews are welcome.

Changes to v4:
- Drop check for 'queue->req' as noticed by Sagi

Changes to v3:
- Drop first patch as it already had been applied
- Include reviews from Sagi
- Check for sk_sock_is_writeable() to avoid requeing io_work when
  the socket is blocked

Changes to v2:
- Removed AEN patches again

Changes to the original submission:
- Include reviews from Chris Leech
- Add patch to requeue namespace scan
- Add patch to re-read ANA log page

Hannes Reinecke (2):
  nvme-tcp: sanitize request list handling
  nvme-tcp: fix I/O stalls on congested sockets

 drivers/nvme/host/tcp.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

-- 
2.35.3




More information about the Linux-nvme mailing list