[PATCH 1/1] Add 'Transport Interface' (triface) option. This can be used to specify the IP interface to use for the connection. The driver uses that to set SO_BINDTODEVICE on the socket before connecting.

Belanger, Martin Martin.Belanger at dell.com
Mon May 10 20:18:12 BST 2021


> >>> ping <dest-ip-addr>%<interface>
> >>
> >> Ping only supports this syntax for IPv6 no?
> >>
> >>> Extending this approach to nvme-cli we arrive to something like this:
> >>>
> >>> nvme discover --traddr 100.64.29.2%enp0s8 --host-traddr
> >>> 192.168.56.102
> >> ....
> >>
> >> We already support this for IPv6, we can do that also for IPv4, but
> >> this syntax may not be trivially expected for ipv4?
> >
> > I tried this for IPv6 and it doesn't work. Here's what I get:
> > $ sudo nvme discover -g -G -t tcp -s 8009 -a fe80::800:27ff:fe00:0
> > Failed to write to /dev/nvme-fabrics: Invalid argument $ sudo nvme
> > discover -g -G -t tcp -s 8009 -a fe80::800:27ff:fe00:0%enp0s8 Failed
> > to write to /dev/nvme-fabrics: Invalid argument $ sudo nvme discover
> > -g -G -t tcp -s 8009 -a [fe80::800:27ff:fe00:0] failed to resolve host
> > [fe80::800:27ff:fe00:0] info $ sudo nvme discover -g -G -t tcp -s 8009
> > -a [fe80::800:27ff:fe00:0%enp0s8] failed to resolve host
> > [fe80::800:27ff:fe00:0%enp0s8] info
> 
> # nvme discover -t tcp -a fe80::5054:ff:fef1:9f3b -w
> fe80::5054:ff:fe28:5edb%enp6s0

Thanks for clarifying the syntax. However, that doesn't work for me. 

# nvme discover -t tcp -a fe80::800:27ff:fe00:0 -w fe80::9266:4855:6cf2:f7e9%enp0s8
Failed to write to /dev/nvme-fabrics: Connection refused

Note that the above syntax does not comply with RFC4007. The '%' delimiter is supposed to be appended to the Destination IP address and not the Source Address. In other words, to be RFC4007-compliant, the syntax should be (using your example):

# nvme discover -t tcp -a fe80::5054:ff:fef1:9f3b%enp6s0 -w fe80::5054:ff:fe28:5edb

This tells nvme-cli to connect to a controller at address fe80::5054:ff:fef1:9f3b using interface enp6s0 for the connection. And set the Source address to fe80::5054:ff:fe28:5edb.

> 
> Discovery Log Number of Records 1, Generation counter 5 =====Discovery
> Log Entry 0======
> trtype:  tcp
> adrfam:  ipv6
> subtype: nvme subsystem
> treq:    not specified, sq flow control disable supported
> portid:  3
> trsvcid: 8009
> subnqn:  testnqn1
> traddr:  fe80::5054:ff:fef1:9f3b%enp6s0
> sectype: none
> 
> >
> >>
> >>> This tells nvme to connect to 100.64.29.2 on interface enp0s8. We
> >>> make no
> >> change to the --host-traddr option. It continues to be used to
> >> specify the Source IP address only (for the rare cases where users
> >> want to specify a Source Address other than the default). With this,
> >> the interface is specified by name and not by its associated address.
> >> This is not only more intuitive, but, as I stated before, eliminates
> >> the problem caused by mapping the same IP address to multiple
> >> interfaces (not to mention that doing a reverse lookup on an IP
> >> address to find the interface is extra work that we don’t need to do in
> kernel space).
> >>
> >> Maybe we do something like ping -I for host_traddr, from ping man
> pages:
> >>
> >> -I interface
> >>              interface is either an address, an interface name or a
> >> VRF name. If interface is an address, it sets source address to specified
> interface address.
> >> If interface is an
> >>              interface name, it sets source interface to specified
> >> interface. If interface is a VRF name, each packet is routed using
> >> the corresponding routing table; in this case, the -I
> >>              option can be repeated to specify a source address. NOTE:
> >> For IPv6, when doing ping to a link-local scope address, link
> >> specification (by the '%'-notation in destination, or
> >>              by this option) can be used but it is no longer required.
> >>
> >>
> >> Without the repetition though, unless we need to support two
> >> interfaces that share the same multiple addresses in the same subnet,
> >> which sounds completely crazy to me...
> >
> > Hi Sagi,
> >
> > If we want to follow ping as an example, the repetition is needed not to
> specify two interfaces, but to specify an interface and the source address. In
> a previous example (reproduced below), I described a configuration where
> an interface had several addresses assigned to it. By default, Linux always
> picks the same Source address (i.e. 192.168.56.101 in this example) when
> connecting. If a user wants a different source address they need a way to
> specify it (currently with --host-traddr). Users also need a way to specify an
> interface separately from the source address (either with a new option like --
> host-iface or by repeating --host-traddr). With the example below, if we
> wanted to force ping to use interface enp0s8 and source address
> 192.168.56.103, we would repeat the -I option, for example "ping -I enp0s8 -I
> 192.168.56.103". We need a way to do the same with nvme-cli.
> >
> > I thought that introducing a new option, "--host-iface", had the smallest
> impact since it requires less code changes, but that was turned down (not
> sure exactly why). I then suggested that we use the '%' delimiter for IPv4 and
> IPv6. I agree that it is not 100% the same as ping since ping only allows the
> '%' delimiter for IPv6 addresses (as per RFC4007). As you suggested, we could
> repeat the --host-traddr option (e.g. --host-traddr enp0s8 --host-traddr
> 192.168.56.103), but this is more impactful to the code than adding a
> separate --host-iface option.
> 
> It's less about code-changes and more on adding a new user ABI, that is the
> reason why (at least I'm fully on board just yet).
> 
> > EXAMPLE: Interface with several addresses assigned:
> > $ ip addr list dev enp0s8
> > 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ...
> >        link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
> >        inet 192.168.56.101/24 brd 192.168.56.255 scope ...
> >           valid_lft 426sec preferred_lft 426sec
> >        inet 192.168.56.102/24 scope global secondary enp0s8
> >           valid_lft forever preferred_lft forever
> >        inet 192.168.56.103/24 scope global secondary enp0s8
> >           valid_lft forever preferred_lft forever
> >        inet 192.168.56.104/24 scope global secondary enp0s8
> >           valid_lft forever preferred_lft forever
> >
> > In the end, it doesn't really matter (to me) how it is implemented.
> However, a solution that have little to no impact on existing code would be
> nice. Just like ping, we need a way to specify an interface by its **interface
> name** (and not by its associated IP address), and we need to allow users to
> select which Source IP address to use when there are multiple addresses
> associated with an interface.
> 
> The '%' may be confusing when it comes to other transports as well (e.g.
> rdma/fc would have to either reject or ignore it, but regardless of how we
> add it that would be the case). Having host-traddr accept either ip or
> interface seems the most desirable, however that won't work if there are 2
> interfaces that share multiple ip addresses. So if this is a requirement we'll
> probably need to add --host-iface as another option...

I don’t grok what you mean by "that won't work if there are 2 interfaces that share multiple ip addresses". Why not? If one specifies the interface by its name (e.g. enp0s8), there is no possible confusion even if multiple interfaces share the same IP addresses. 

The following are some examples of how nvme-cli should work to comply with RFC4007 and be consistent to the way ping operates.
Example 1 - IPv4, Specify Interface with -w and let Linux select Source address: 
nvme discover -t tcp -a 192.168.1.9 -w enp0s8

Example 2 - IPv4, Specify Interface and Source address with repeated -w:  
nvme discover -t tcp -a 192.168.1.9 -w enp0s8 -w 192.168.56.103

Example 3 - IPv6, Specify Interface with'%' delimiter and let Linux select Source address:
nvme discover -t tcp -a fe80::800:27ff:fe00:0%enp0s8

Example 4 - IPv6, Specify Interface with -w and let Linux select Source address:
nvme discover -t tcp -a fe80::800:27ff:fe00:0 -w enp0s8

Example 5 - IPv6, Specify Interface with'%' delimiter and Source address with -w: 
nvme discover -t tcp -a fe80::800:27ff:fe00:0%enp0s8 -w fe80::9266:4855:6cf2:f7e9

Example 6 - IPv6, Specify Interface and Source address with repeated -w: 
nvme discover -t tcp -a fe80::800:27ff:fe00:0 -w enp0s8 -w fe80::9266:4855:6cf2:f7e9

Martin


More information about the Linux-nvme mailing list