[PATCH v2 2/3] nvme-tcp: support specifying the congestion-control
John Meneghini
jmeneghi at redhat.com
Tue Apr 5 09:48:44 PDT 2022
On 3/29/22 03:46, Sagi Grimberg wrote:
>> In addition, distributed storage products like the following also have
>> the above problem:
>>
>> - The product consists of a cluster of servers.
>>
>> - Each server serves clients via its front-end NIC
>> (WAN, high latency).
>>
>> - All servers interact with each other via NVMe/TCP via back-end NIC
>> (LAN, low latency, ECN-enabled, ideal for dctcp).
>
> Separate networks are still not application (nvme-tcp) specific and as
> mentioned, we have a way to control that. IMO, this still does not
> qualify as solid justification to add this to nvme-tcp.
>
> What do others think?
OK. I'll bite.
In my experience adding any type of QOS control a Storage Area Network causes problems because it increases the likelihood of
ULP timeouts (command timeouts).
NAS protocols like NFS and CIFs have built in assumptions about latency. They have long timeouts at the session layer and they
trade latency for reliable delivery. SAN protocols like iSCSI and NVMe/TCP make no such trade off. All block protocols have
much shorter per-command timeouts and they expect reliable delivery. These timeouts are much shorter and doing anything to the
TCP connection which could increase latency runs the risk of causing the side effect of command timeouts. In NVMe we also have
the Keep alive timeout which could be affected by TCP latency. It's for this reason that most SANs are deployed on LANs not
WANs. It's also for this reason that most Cluster monitor mechanisms (components that maintain cluster wide membership through
heat beats) use UDP not TCP.
With NVMe/TCP we want the connection layer to go as fast as possible and I agree with Sagi that adding any kind of QOS mechanism
to the transport is not desirable.
/John
More information about the Linux-nvme
mailing list