[PATCHv6 1/1] nvme-tcp: Add option to set the physical interface to be used when connecting over TCP sockets.

Sagi Grimberg sagi at grimberg.me
Mon May 17 11:25:03 PDT 2021



On 5/17/21 11:16 AM, Martin Belanger wrote:
> From: Martin Belanger <martin.belanger at dell.com>
> 
> Addressed Sagi's review from PATCHv5.

This commentary belongs after the '---' separator.

> 
> In our application, we need a way to force TCP connections to go out a
> specific IP interface instead of letting Linux select the interface
> based on the routing tables. This patch adds the option 'host-iface'
> to allow specifying the interface to use. Note that corresponding
> changes to the nvme-cli utility will follow.
> 
> When the option host-iface is specified, the driver uses the specified
> interface to set the option SO_BINDTODEVICE on the TCP socket before
> connecting.
> 
> This new option is needed in addtion to the existing host-traddr for
> the following reasons:
> 
> Specifying an IP interface by its associated IP address is less
> intuitive than specifying the actual interface name and, in some cases,
> simply doesn't work. That's because the association between interfaces
> and IP addresses is not predictable. IP addresses can be changed or can
> change by themselves over time (e.g. DHCP). Interface names are
> predictable [1] and will persist over time. Consider the following
> configuration.
> 
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state ...
>      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>      inet 100.0.0.100/24 scope global lo
>         valid_lft forever preferred_lft forever
> 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
>      link/ether 08:00:27:21:65:ec brd ff:ff:ff:ff:ff:ff
>      inet 100.0.0.100/24 scope global enp0s3
>         valid_lft forever preferred_lft forever
> 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
>      link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
>      inet 100.0.0.100/24 scope global enp0s8
>         valid_lft forever preferred_lft forever
> 
> The above is a VM that I configured with the same IP address
> (100.0.0.100) on all interfaces. Doing a reverse lookup to identify the
> unique interface associated with 100.0.0.100 does not work here. And
> this is why the option host_iface is required. I understand that the
> above config does not represent a standard host system, but I'm using
> this to prove a point: "We can never know how users will configure
> their systems". By te way, The above configuration is perfectly fine
> by Linux.
> 
> The current TCP implementation for host_traddr performs a
> bind()-before-connect(). This is a common construct to set the source
> IP address on a TCP socket before connecting. This has no effect on how
> Linux selects the interface for the connection. That's because Linux
> uses the Weak End System model as described in RFC1122 [2]. On the other
> hand, setting the Source IP Address has benefits and should be supported
> by linux-nvme. In fact, setting the Source IP Address is a mandatory
> FedGov requirement (e.g. connection to a RADIUS/TACACS+ server).
> Consider the following configuration.
> 
> $ ip addr list dev enp0s8
> 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
>      link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
>      inet 192.168.56.101/24 brd 192.168.56.255 scope global enp0s8
>         valid_lft 426sec preferred_lft 426sec
>      inet 192.168.56.102/24 scope global secondary enp0s8
>         valid_lft forever preferred_lft forever
>      inet 192.168.56.103/24 scope global secondary enp0s8
>         valid_lft forever preferred_lft forever
>      inet 192.168.56.104/24 scope global secondary enp0s8
>         valid_lft forever preferred_lft forever
> 
> Here we can see that several addresses are associated with interface
> enp0s8. By default, Linux always selects the default IP address,
> 192.168.56.101, as the source address when connecting over interface
> enp0s8. Some users, however, want the ability to specify a different
> source address (e.g., 192.168.56.102, 192.168.56.103, ...). The option
> host_traddr can be used as-is to perform this function.
> 
> In conclusion, I believe that we need 2 options for TCP connections.
> One that can be used to specify an interface (host-iface). And one that
> can be used to set the source address (host-traddr). Users should be
> allowed to use one or the other, or both, or none. Of course, the
> documentation for host_traddr will need some clarification. It should
> state that when used for TCP connection, this option only sets the
> source address. And the documentation for host_iface should say that
> this option is only available for TCP connections.
> 
> References:
> [1] https://www.freedesktop.org/wiki/Software/systemd/\
> PredictableNetworkInterfaceNames/
> [2] https://tools.ietf.org/html/rfc1122
> 
> Tested both IPv4 and IPv6 connections.

Also this.

Can you send the nvme-cli bits as well?



More information about the Linux-nvme mailing list