[PATCHv6 1/1] nvme-tcp: Add option to set the physical interface to be used when connecting over TCP sockets.

Sagi Grimberg sagi at grimberg.me
Thu May 20 11:54:01 PDT 2021


>>> Addressed Sagi's review from PATCHv5.
>>
>> This commentary belongs after the '---' separator.
>>
>>>
>>> In our application, we need a way to force TCP connections to go out a
>>> specific IP interface instead of letting Linux select the interface
>>> based on the routing tables. This patch adds the option 'host-iface'
>>> to allow specifying the interface to use. Note that corresponding
>>> changes to the nvme-cli utility will follow.
>>>
>>> When the option host-iface is specified, the driver uses the specified
>>> interface to set the option SO_BINDTODEVICE on the TCP socket before
>>> connecting.
>>>
>>> This new option is needed in addtion to the existing host-traddr for
>>> the following reasons:
>>>
>>> Specifying an IP interface by its associated IP address is less
>>> intuitive than specifying the actual interface name and, in some
>>> cases, simply doesn't work. That's because the association between
>>> interfaces and IP addresses is not predictable. IP addresses can be
>>> changed or can change by themselves over time (e.g. DHCP). Interface
>>> names are predictable [1] and will persist over time. Consider the
>>> following configuration.
>>>
>>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state ...
>>>       link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>>       inet 100.0.0.100/24 scope global lo
>>>          valid_lft forever preferred_lft forever
>>> 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
>>>       link/ether 08:00:27:21:65:ec brd ff:ff:ff:ff:ff:ff
>>>       inet 100.0.0.100/24 scope global enp0s3
>>>          valid_lft forever preferred_lft forever
>>> 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
>>>       link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
>>>       inet 100.0.0.100/24 scope global enp0s8
>>>          valid_lft forever preferred_lft forever
>>>
>>> The above is a VM that I configured with the same IP address
>>> (100.0.0.100) on all interfaces. Doing a reverse lookup to identify
>>> the unique interface associated with 100.0.0.100 does not work here.
>>> And this is why the option host_iface is required. I understand that
>>> the above config does not represent a standard host system, but I'm
>>> using this to prove a point: "We can never know how users will
>>> configure their systems". By te way, The above configuration is
>>> perfectly fine by Linux.
>>>
>>> The current TCP implementation for host_traddr performs a
>>> bind()-before-connect(). This is a common construct to set the source
>>> IP address on a TCP socket before connecting. This has no effect on
>>> how Linux selects the interface for the connection. That's because
>>> Linux uses the Weak End System model as described in RFC1122 [2]. On
>>> the other hand, setting the Source IP Address has benefits and should
>>> be supported by linux-nvme. In fact, setting the Source IP Address is
>>> a mandatory FedGov requirement (e.g. connection to a RADIUS/TACACS+
>> server).
>>> Consider the following configuration.
>>>
>>> $ ip addr list dev enp0s8
>>> 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
>>>       link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
>>>       inet 192.168.56.101/24 brd 192.168.56.255 scope global enp0s8
>>>          valid_lft 426sec preferred_lft 426sec
>>>       inet 192.168.56.102/24 scope global secondary enp0s8
>>>          valid_lft forever preferred_lft forever
>>>       inet 192.168.56.103/24 scope global secondary enp0s8
>>>          valid_lft forever preferred_lft forever
>>>       inet 192.168.56.104/24 scope global secondary enp0s8
>>>          valid_lft forever preferred_lft forever
>>>
>>> Here we can see that several addresses are associated with interface
>>> enp0s8. By default, Linux always selects the default IP address,
>>> 192.168.56.101, as the source address when connecting over interface
>>> enp0s8. Some users, however, want the ability to specify a different
>>> source address (e.g., 192.168.56.102, 192.168.56.103, ...). The option
>>> host_traddr can be used as-is to perform this function.
>>>
>>> In conclusion, I believe that we need 2 options for TCP connections.
>>> One that can be used to specify an interface (host-iface). And one
>>> that can be used to set the source address (host-traddr). Users should
>>> be allowed to use one or the other, or both, or none. Of course, the
>>> documentation for host_traddr will need some clarification. It should
>>> state that when used for TCP connection, this option only sets the
>>> source address. And the documentation for host_iface should say that
>>> this option is only available for TCP connections.
>>>
>>> References:
>>> [1]
>>> https://urldefense.com/v3/__https://www.freedesktop.org/wiki/Software/
>>> systemd/*5C__;JQ!!LpKI!3qE5jJQA-REQkOr1c042U-
>> ghm28oHvTE48YZkHM5ugob8Sm
>>> IPPIHxwEm7iwkC9kZyA$ [freedesktop[.]org]
>>> PredictableNetworkInterfaceNames/ [2]
>>> https://urldefense.com/v3/__https://tools.ietf.org/html/rfc1122__;!!Lp
>>> KI!3qE5jJQA-REQkOr1c042U-
>> ghm28oHvTE48YZkHM5ugob8SmIPPIHxwEm7ixiy1Q97A$
>>> [tools[.]ietf[.]org]
>>>
>>> Tested both IPv4 and IPv6 connections.
>>
>> Also this.
>>
>> Can you send the nvme-cli bits as well?
> 
> Hi Sagi,
> 
> Just checking if there anything else I can do to help with this patch?

I think just the change log fixes,

Also you can add my:
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>



More information about the Linux-nvme mailing list