[PATCH v2 2/3] nvme-tcp: support specifying the congestion-control

Mingbao Sun sunmingbao at tom.com
Sun Mar 13 18:34:29 PDT 2022


Before answering the questions, I’d like to address the motivation
behind this patchset.

You know, InfiniBand/RoCE provides NVMe-oF a lossless network
environment (that is zero packet loss), which is a great advantage
to performance.

In contrast, 'TCP/IP + ethernet' is often used as a lossy network
environment (packet dropping often occurs). 
And once packet dropping occurs, timeout-retransmission would be
triggered. But once timeout-retransmission was triggered, bandwidth
would drop to 0 all of a sudden. This is great damage to performance.

So although NVMe/TCP may have a bandwidth competitive to that of
NVMe/RDMA, but the packet dropping of the former is a flaw to
its performance.

However, with the combination of the following conditions, NVMe/TCP
can become much more competitive to NVMe/RDMA in the data center.

  - Ethernet NICs supporting QoS configuration (support mapping TOS/DSCP
    in IP header into priority, supporting adjusting buffer size of each
    priority, support PFC)

  - Ethernet Switches supporting ECN marking, supporting adjusting
    buffer size of each priority.

  - NVMe/TCP supports specifying the tos for its TCP traffic
    (already implemented)

  - NVMe/TCP supports specifying dctcp as the congestion-control of its
    TCP sockets (the work of this feature)

So this feature is the last item from the software aspect to form up the
above combination.



More information about the Linux-nvme mailing list