[PATCH v2 2/3] nvme-tcp: support specifying the congestion-control
John Meneghini
jmeneghi at redhat.com
Tue Apr 5 09:50:24 PDT 2022
If you want things to slow down with NVMe, use the protocol's built in flow control mechanism: SQ flow control. This will keep
the commands out of the transport queue and avoid the possibility of unwanted or unexpected command timeouts.
But this is another topic for discussion.
/John
On 4/5/22 12:48, John Meneghini wrote:
>
> On 3/29/22 03:46, Sagi Grimberg wrote:
>>> In addition, distributed storage products like the following also have
>>> the above problem:
>>>
>>> - The product consists of a cluster of servers.
>>>
>>> - Each server serves clients via its front-end NIC
>>> (WAN, high latency).
>>>
>>> - All servers interact with each other via NVMe/TCP via back-end NIC
>>> (LAN, low latency, ECN-enabled, ideal for dctcp).
>>
>> Separate networks are still not application (nvme-tcp) specific and as
>> mentioned, we have a way to control that. IMO, this still does not
>> qualify as solid justification to add this to nvme-tcp.
>>
>> What do others think?
>
> OK. I'll bite.
>
> In my experience adding any type of QOS control a Storage Area Network causes problems because it increases the likelihood of
> ULP timeouts (command timeouts).
>
> NAS protocols like NFS and CIFs have built in assumptions about latency. They have long timeouts at the session layer and they
> trade latency for reliable delivery. SAN protocols like iSCSI and NVMe/TCP make no such trade off. All block protocols have
> much shorter per-command timeouts and they expect reliable delivery. These timeouts are much shorter and doing anything to the
> TCP connection which could increase latency runs the risk of causing the side effect of command timeouts. In NVMe we also have
> the Keep alive timeout which could be affected by TCP latency. It's for this reason that most SANs are deployed on LANs not
> WANs. It's also for this reason that most Cluster monitor mechanisms (components that maintain cluster wide membership through
> heat beats) use UDP not TCP.
>
> With NVMe/TCP we want the connection layer to go as fast as possible and I agree with Sagi that adding any kind of QOS mechanism
> to the transport is not desirable.
>
> /John
>
More information about the Linux-nvme
mailing list