[PATCH 1/1] Add 'Transport Interface' (triface) option. This can be used to specify the IP interface to use for the connection. The driver uses that to set SO_BINDTODEVICE on the socket before connecting.

Belanger, Martin Martin.Belanger at dell.com
Tue May 11 14:41:32 BST 2021


> >>>> We already support this for IPv6, we can do that also for IPv4, but
> >>>> this syntax may not be trivially expected for ipv4?
> >>>
> >>> I tried this for IPv6 and it doesn't work. Here's what I get:
> >>> $ sudo nvme discover -g -G -t tcp -s 8009 -a fe80::800:27ff:fe00:0
> >>> Failed to write to /dev/nvme-fabrics: Invalid argument $ sudo nvme
> >>> discover -g -G -t tcp -s 8009 -a fe80::800:27ff:fe00:0%enp0s8 Failed
> >>> to write to /dev/nvme-fabrics: Invalid argument $ sudo nvme discover
> >>> -g -G -t tcp -s 8009 -a [fe80::800:27ff:fe00:0] failed to resolve
> >>> host [fe80::800:27ff:fe00:0] info $ sudo nvme discover -g -G -t tcp
> >>> -s 8009 -a [fe80::800:27ff:fe00:0%enp0s8] failed to resolve host
> >>> [fe80::800:27ff:fe00:0%enp0s8] info
> >>
> >> # nvme discover -t tcp -a fe80::5054:ff:fef1:9f3b -w
> >> fe80::5054:ff:fe28:5edb%enp6s0
> >
> > Thanks for clarifying the syntax. However, that doesn't work for me.
> >
> > # nvme discover -t tcp -a fe80::800:27ff:fe00:0 -w
> > fe80::9266:4855:6cf2:f7e9%enp0s8 Failed to write to /dev/nvme-fabrics:
> > Connection refused
> 
> Are you using the linux target? connection refused means that you don't
> have a listener on it, it's not a resolution error.
> 
> did you have the target listen on fe80::800:27ff:fe00:0%<intf> ?

Doh! You are correct. In my setup, I run the nvme-cli client on a VM and I run the target (nvmet) on the host computer. I had nvmet configured for "0.0.0.0" instead of "::" (i.e. listen on all interfaces). 

After changing nvmet's configuration, I was able to query the discovery log pages, using this syntax:
nvme discover -t tcp -a fe80::800:27ff:fe00:0 -w fe80::9266:4855:6cf2:f7ea%enp0s8

Note that it doesn't work when I append the interface to the Destination IP address as per RFC4007 (like ping) as follows.
nvme discover -t tcp -a fe80::800:27ff:fe00:0%enp0s8 -w fe80::9266:4855:6cf2:f7ea

> 
> >
> > Note that the above syntax does not comply with RFC4007. The '%'
> delimiter is supposed to be appended to the Destination IP address and not
> the Source Address. In other words, to be RFC4007-compliant, the syntax
> should be (using your example):
> >
> > # nvme discover -t tcp -a fe80::5054:ff:fef1:9f3b%enp6s0 -w
> > fe80::5054:ff:fe28:5edb
> >
> > This tells nvme-cli to connect to a controller at address
> fe80::5054:ff:fef1:9f3b using interface enp6s0 for the connection. And set the
> Source address to fe80::5054:ff:fe28:5edb.
> 
> This also seems to work, not sure that it does what we want though...
> nvme discover -t tcp -a fe80::5054:ff:fef1:9f3b%enp6s0 -w
> fe80::5054:ff:fe28:5edb%enp6s0
> 
> Discovery Log Number of Records 1, Generation counter 5 =====Discovery
> Log Entry 0======
> trtype:  tcp
> adrfam:  ipv6
> subtype: nvme subsystem
> treq:    not specified, sq flow control disable supported
> portid:  3
> trsvcid: 8009
> subnqn:  testnqn1
> traddr:  fe80::5054:ff:fef1:9f3b%enp6s0
> sectype: none
> 
> 
> >> The '%' may be confusing when it comes to other transports as well (e.g.
> >> rdma/fc would have to either reject or ignore it, but regardless of
> >> how we add it that would be the case). Having host-traddr accept
> >> either ip or interface seems the most desirable, however that won't
> >> work if there are 2 interfaces that share multiple ip addresses. So
> >> if this is a requirement we'll probably need to add --host-iface as another
> option...
> >
> > I don’t grok what you mean by "that won't work if there are 2 interfaces
> that share multiple ip addresses". Why not? If one specifies the interface by
> its name (e.g. enp0s8), there is no possible confusion even if multiple
> interfaces share the same IP addresses.
> >
> > The following are some examples of how nvme-cli should work to comply
> with RFC4007 and be consistent to the way ping operates.
> > Example 1 - IPv4, Specify Interface with -w and let Linux select Source
> address:
> > nvme discover -t tcp -a 192.168.1.9 -w enp0s8
> >
> > Example 2 - IPv4, Specify Interface and Source address with repeated -w:
> > nvme discover -t tcp -a 192.168.1.9 -w enp0s8 -w 192.168.56.103
> 
> I meant without the repetitions, which you only need if you have 2 devices
> that share more than one address, which again, is not a clear use-case to
> me, but without repetitions we won't support that.

I've been thinking about what you said regarding the need to repeat the -w option when two interfaces share the same IP address. I think we're looking at the problem from a different point of view. The current implementation uses an IP address to identify an interface. I, on the other hand, believe that the best way to identify an interface is by its "interface name or index". In previous emails, I provided examples of the problems that may occur when using an IP address to identify an interface. For example, one can assign the same IP address to different interfaces making it impossible to distinguish interfaces by their IP address alone. Another example is that the low level APIs (e.g. setsockopt(SO_BINDTODEVICE) don’t even require the source IP address. They only need the interface name/index. So, why go through the trouble of performing a reverse address lookup to retrieve the interface name/index when the address is not used at all? 

By the way, if nvme-cli/linux-nvme allowed specifying interfaces by name/index, then we would not really need to repeat the -w option unless we also wanted to set the source address at the same time. Setting the source address is a completely different thing from setting the interface. One should be allowed to set one independently from the other, or both, or none.

If you look at how ping is implemented, they do not infer the interface from the IP address. If one wants to force ping to go over an interface, then one must provide the interface by name/index using the -I option. If one wants to change the source IP address (without forcing a specific interface), then one provides the IP address to the -I option. It's simple and intuitive. And ping also supports appending the interface to the Destination IP using the '%' delimiter for IPv6-only as per RFC4007.

I think that nvme-cli/linux-nvme should follow the ping approach. Interfaces should never be inferred from source IP addresses, but instead be clearly identified by their name or index. And setting the source address should be independent from setting the interface.

Regards,
Martin

> 
> > Example 3 - IPv6, Specify Interface with'%' delimiter and let Linux select
> Source address:
> > nvme discover -t tcp -a fe80::800:27ff:fe00:0%enp0s8
> >
> > Example 4 - IPv6, Specify Interface with -w and let Linux select Source
> address:
> > nvme discover -t tcp -a fe80::800:27ff:fe00:0 -w enp0s8
> >
> > Example 5 - IPv6, Specify Interface with'%' delimiter and Source address
> with -w:
> > nvme discover -t tcp -a fe80::800:27ff:fe00:0%enp0s8 -w
> > fe80::9266:4855:6cf2:f7e9
> >
> > Example 6 - IPv6, Specify Interface and Source address with repeated -w:
> > nvme discover -t tcp -a fe80::800:27ff:fe00:0 -w enp0s8 -w
> > fe80::9266:4855:6cf2:f7e9
> >
> > Martin
> >


More information about the Linux-nvme mailing list