nvme-cli/libnvme: Controller reuse policy
Hannes Reinecke
hare at suse.de
Fri Jun 9 07:09:25 PDT 2023
On 6/6/23 21:49, Belanger, Martin wrote:
> I came across an issue using nvme-cli/libnvme. It was suggested that I move parts of the discussion
> to this mailing list.
>
> The relevant info can be found in the libnvme repo here:
> https://github.com/linux-nvme/libnvme/pull/647
> https://github.com/linux-nvme/libnvme/issues/648
>
> My question is related to nvme-oF-TCP, but may apply to other transports as well (I'm just not an
> expert for those other transports).
>
> The question is: How should nvme-cli/libnvme compare existing controllers (in sysfs) to a candidate
> configuration to determine whether to create a new controller or to reuse an existing one? >
Well, for persistent discovery controllers we should, right?
That's kinda the point of a _persistent_ discovery controller ...\
> Background:
>
> The configuration parameters to create a controller are:
> 1. --transport
> 2. --traddr
> 3. --trsvcid
> 4. --nqn
> 5. --host-traddr
> 6. --host-iface (TCP only)
>
> When trying to configure a controller, nvme-cli/libnvme scans the sysfs looking for an existing
> controller that matches the candidate config. If a match is found, nvme-cli/libnvme reuses the
> existing controller instead of creating a new one. nvme-cli/libnvme compares the 6 configuration
> parameters listed above one-by-one to determine if an existing controller matches a candidate
> config. Some of these parameters like "transport" and "traddr" can never be NULL. Others, such
> as "host-traddr" and "host-iface" (and maybe others), may be NULL when not needed.
>
> Currently, nvme-cli/libnvme does a "soft" comparison for "host-traddr" and "host-iface". That
> is to say that if an existing controller was created without specifying one or both parameters
> (i.e. set to NULL), then the NULL parameter will be ignored when comparing against the candidate
> config. This may result in nvme-cli/libnvme reusing an existing controller that does not match
> the candidate "host-traddr" and/or "host-iface".
>
The problem here is that 'host-traddr' and 'host-iface' are linux
concepts, and have no equivalent in the NVMe spec.
While it's easy to do a 1:1 consistency check for a simple 'connect'
command, it gets ever so complicated for the 'connect-all' case:
There we'll have to check each discovery log page entry against a
matching controller in sysfs.
As the discovery log page entry cannot contain any of the 'host-traddr'
and 'host-iface' components, we would always default to create a new
controller for any entry in the discovery log page if the original
connection was specified with either of these arguments.
Which clearly is not what we want.
At the same time we cannot re-use the arguments from the 'connect'
command to the original discovery controller as we might have
encountered a referral or the DLPE might specify a different transport
altogether.
> Why is this important?
>
> We now have support for booting systems over nvme-oF-TCP. What typically
> happens during early boot is that a controller may be created over the
> system's management interface (i.e. the default route). The management
> interface is typically a slower interface (e.g. 1G), which is fine for
> booting a system. However, once the system is fully booted we may want
> to use a different (faster) interface (e.g. 10G) to connect to that same
> subsystem. We can force the new connection on a different interface using
> the "host-iface" parameter. However, because nvme-cli/libnvme performs
> a "soft" compare, it will conclude that the existing (slower) controller
> is "good enough" and reuse it instead of creating a new connection.
>
> One more piece of info (to make things more interesting):
>
> When a controller is created without specifying "host-traddr" and "host-iface",
> the kernel will decide which source-address and interface to use for that
> connection. The interface is selected by looking up the destination address
> (traddr) in the routing table. The source address is selected by retrieving
> the primary address assigned to that interface. It is possible that the values
> picked by the kernel will match the host-iface/host-traddr of a candidate config.
> However, because they are not exposed in the sysfs, the comparison will conclude
> that none of the existing controllers match the candidate config (when in fact
> there is a match). This is one of the reasons why I added the "src_addr" parameter
> to the sysfs (kernel 6.1). For TCP connections, the kernel now displays the actual
> source address of the connection through the "src_addr" attribute. This can be
> used to identify not only the source address, but also on which interface the
> connection was made (by doing an interface lookup to find out which one matches
> the source address).
>
> In conclusion, we have 3 choices to solve the issue of matching an existing
> controller to a candidate configuration:
>
> 1) We can leave everything the way it is now. In some cases, nvme-cli/libnvme
> may erroneously conclude that an existing controller is "good enough" and not
> allow someone to make a different connection using a different "host-traddr"
> and/or "host-iface". We may want to document that behavior so that people
> know what to expect.
Why 'errorneously'? If the user has specified 'host-traddr' and/or
'host-iface' we'll be selecting a controller which matches these
criteria. _If_ we find several matching controllers it means that the
search criteria are insufficient, and obviously the user didn't care.
> 2) We can change the code so that nvme-cli/libnvme does a "strong"
> comparison on the "host-traddr" and/or "host-iface". This may result
> in duplicate connections because an existing controller that was
> configured without specifying the "host-traddr" and/or "host-iface"
> may in fact be assigned internally the same "host-traddr" and/or
> "host-iface" of the candidate configuration and therefore could
> have been reused.
I don't think this it a good idea.
As outlined above, we cannot have all information for the 'connect-all'
case, so we'll always end up creating a new controller there.
> 3) For the case where an existing controller was created without
> specifying the "host-traddr" and/or "host-iface" (and therefore
> the kernel picked those automatically), we can look at the existing
> controller's "src_addr" attribute to determine if there is a match
> with the candidate config's "host-traddr" and/or "host-iface". With
> this, we are guaranteed to never have duplicate connections and we
> will always know when an existing connection can be reused for a
> candidate configuration. However, this requires doing a lookup of
> all the interfaces to find out which one matches the "src_addr"
> (the lookup can be done through the netlink interface).
>
Hmm. I don't think the is a different case; it's just the opposite
of case 1) (ie we have insufficient information to compare the existing
controller with).
I would shoot for having a 'lazy' check, in the sense that a
non-existing setting host-iface or host-traddr is an automatic match
(as it's not specified, so clearly it's not important).
That will allow for persistent discovery controllers to work
'best', and at the same time to allow for the admin to override
the default behaviour by specifying all arguments.
Cheers,
Hannes
More information about the Linux-nvme
mailing list