[PATCH 0/2] NVMe_over_TCP: support specifying the congestion-control

Mingbao Sun sunmingbao at tom.com
Tue Mar 8 05:03:20 PST 2022


I feel that I'd better address this a little bit more to express the
meaning behind this feature.

You know, InfiniBand/RoCE provides NVMe-oF a lossless network
environment (that is zero packet loss), which is a great advantage
to performance.

In contrast, 'TCP/IP + ethernet' is often used as a lossy network
environment (packet dropping often occurs). 
And once packet dropping occurs, timeout-retransmission would be
triggered. But once timeout-retransmission was triggered, it’s a great
damage to the performance.

So although NVMe/TCP may have a bandwidth competitive to that of
NVMe/RDMA, but the packet dropping of the former is a flaw to
its performance.

However, with the combination of the following conditions, NVMe/TCP
can almost be as competitive as NVMe/RDMA in the data center.

  - Ethernet NICs supporting QoS configuration (support mapping TOS/DSCP
    in IP header into priority, support PFC)

  - Ethernet Switches supporting ECN marking, supporting adjusting
    buffer size of each priority.

  - NVMe/TCP supports specifying the tos for its TCP traffic
    (already implemented)

  - NVMe/TCP supports specifying dctcp as the congestion-control of its
    TCP sockets (the work of this feature)

So this feature is the last item from the software aspect to form up the
above combination.




More information about the Linux-nvme mailing list