[PATCH 0/3] nvme-tcp: queue stalls under high load

Sagi Grimberg sagi at grimberg.me
Fri May 20 02:20:35 PDT 2022


> Hi all,
> 
> one of our partners registered queue stalls and I/O timeouts under
> high load. Analysis revealed that we see an extremely 'choppy' I/O
> behaviour when running large transfers on systems on low-performance
> links (eg 1GigE networks).
> We had a system with 30 queues trying to transfer 128M requests; simple
> calculation shows that transferring a _single_ request on all queues
> will take up to 38 seconds, thereby timing out the last request before
> it got sent.
> As a solution I first fixed up the timeout handler to reset the timeout
> if the request is still queued or in the process of being send. The
> second path modifies the send path to only allow for new requests if we
> have enough space on the TX queue, and finally break up the send loop to
> avoid system stalls when sending large request.

What is the average latency you are seeing with this test?
I'm guessing more than 30 seconds :)



More information about the Linux-nvme mailing list