[PATCH v2 2/3] nvme-tcp: support specifying the congestion-control

Tue Apr 5 09:48:44 PDT 2022

On 3/29/22 03:46, Sagi Grimberg wrote:
>> In addition, distributed storage products like the following also have
>> the above problem:
>>
>>      - The product consists of a cluster of servers.
>>
>>      - Each server serves clients via its front-end NIC
>>       (WAN, high latency).
>>
>>      - All servers interact with each other via NVMe/TCP via back-end NIC
>>       (LAN, low latency, ECN-enabled, ideal for dctcp).
> 
> Separate networks are still not application (nvme-tcp) specific and as
> mentioned, we have a way to control that. IMO, this still does not
> qualify as solid justification to add this to nvme-tcp.
> 
> What do others think?

OK. I'll bite.

In my experience adding any type of QOS control a Storage Area Network causes problems because it increases the likelihood of 
ULP timeouts (command timeouts).

NAS protocols like NFS and CIFs have built in assumptions about latency. They have long timeouts at the session layer and they 
trade latency for reliable delivery.  SAN protocols like iSCSI and NVMe/TCP make no such trade off. All block protocols have 
much shorter per-command timeouts and they expect reliable delivery. These timeouts are much shorter and doing anything to the 
TCP connection which could increase latency runs the risk of causing the side effect of command timeouts.  In NVMe we also have 
the Keep alive timeout which could be affected by TCP latency. It's for this reason that most SANs are deployed on LANs not 
WANs. It's also for this reason that most Cluster monitor mechanisms (components that maintain cluster wide membership through 
heat beats) use UDP not TCP.

With NVMe/TCP we want the connection layer to go as fast as possible and I agree with Sagi that adding any kind of QOS mechanism 
to the transport is not desirable.

/John