nvme-cli/libnvme: Controller reuse policy
Belanger, Martin
Martin.Belanger at dell.com
Fri Jun 9 08:11:50 PDT 2023
Thanks for your response Hannes.
> On 6/6/23 21:49, Belanger, Martin wrote:
> > I came across an issue using nvme-cli/libnvme. It was suggested that I
> > move parts of the discussion to this mailing list.
> >
> > The relevant info can be found in the libnvme repo here:
> > https://urldefense.com/v3/__https://github.com/linux-nvme/libnvme/pull
> > /647__;!!LpKI!iPR_G07L-qHIqc5s-
> 3oG52BDLqsnUGFIBzsPBWrE5yFsOala7CLY7TCF
> > aEr_pBKy0lLN91-SMXKKomo$ [github[.]com]
> > https://urldefense.com/v3/__https://github.com/linux-nvme/libnvme/issu
> > es/648__;!!LpKI!iPR_G07L-qHIqc5s-
> 3oG52BDLqsnUGFIBzsPBWrE5yFsOala7CLY7T
> > CFaEr_pBKy0lLN91-SsAY7H4g$ [github[.]com]
> >
> > My question is related to nvme-oF-TCP, but may apply to other
> > transports as well (I'm just not an expert for those other transports).
> >
> > The question is: How should nvme-cli/libnvme compare existing
> > controllers (in sysfs) to a candidate configuration to determine
> > whether to create a new controller or to reuse an existing one? >
> Well, for persistent discovery controllers we should, right?
> That's kinda the point of a _persistent_ discovery controller ...\
>
> > Background:
> >
> > The configuration parameters to create a controller are:
> > 1. --transport
> > 2. --traddr
> > 3. --trsvcid
> > 4. --nqn
> > 5. --host-traddr
> > 6. --host-iface (TCP only)
> >
> > When trying to configure a controller, nvme-cli/libnvme scans the
> > sysfs looking for an existing controller that matches the candidate
> > config. If a match is found, nvme-cli/libnvme reuses the existing
> > controller instead of creating a new one. nvme-cli/libnvme compares
> > the 6 configuration parameters listed above one-by-one to determine if
> > an existing controller matches a candidate config. Some of these parameters
> like "transport" and "traddr" can never be NULL. Others, such as "host-traddr"
> and "host-iface" (and maybe others), may be NULL when not needed.
> >
> > Currently, nvme-cli/libnvme does a "soft" comparison for "host-traddr"
> > and "host-iface". That is to say that if an existing controller was
> > created without specifying one or both parameters (i.e. set to NULL),
> > then the NULL parameter will be ignored when comparing against the
> > candidate config. This may result in nvme-cli/libnvme reusing an existing
> controller that does not match the candidate "host-traddr" and/or "host-iface".
> >
> The problem here is that 'host-traddr' and 'host-iface' are linux concepts, and
> have no equivalent in the NVMe spec.
>
> While it's easy to do a 1:1 consistency check for a simple 'connect'
> command, it gets ever so complicated for the 'connect-all' case:
> There we'll have to check each discovery log page entry against a matching
> controller in sysfs.
> As the discovery log page entry cannot contain any of the 'host-traddr'
> and 'host-iface' components, we would always default to create a new
> controller for any entry in the discovery log page if the original connection was
> specified with either of these arguments.
> Which clearly is not what we want.
> At the same time we cannot re-use the arguments from the 'connect'
> command to the original discovery controller as we might have encountered a
> referral or the DLPE might specify a different transport altogether.
Actually, when we get an AEN from a Discovery Controller and the udev rules invoke connect-all, we pass the host-traddr/host-iface of the discovery controller to the connect-all command.
>
> > Why is this important?
> >
> > We now have support for booting systems over nvme-oF-TCP. What
> > typically happens during early boot is that a controller may be
> > created over the system's management interface (i.e. the default
> > route). The management interface is typically a slower interface (e.g.
> > 1G), which is fine for booting a system. However, once the system is
> > fully booted we may want to use a different (faster) interface (e.g.
> > 10G) to connect to that same subsystem. We can force the new
> > connection on a different interface using the "host-iface" parameter.
> > However, because nvme-cli/libnvme performs a "soft" compare, it will
> > conclude that the existing (slower) controller is "good enough" and reuse it
> instead of creating a new connection.
> >
> > One more piece of info (to make things more interesting):
> >
> > When a controller is created without specifying "host-traddr" and
> > "host-iface", the kernel will decide which source-address and
> > interface to use for that connection. The interface is selected by
> > looking up the destination address
> > (traddr) in the routing table. The source address is selected by
> > retrieving the primary address assigned to that interface. It is
> > possible that the values picked by the kernel will match the host-iface/host-
> traddr of a candidate config.
> > However, because they are not exposed in the sysfs, the comparison
> > will conclude that none of the existing controllers match the
> > candidate config (when in fact there is a match). This is one of the
> > reasons why I added the "src_addr" parameter to the sysfs (kernel
> > 6.1). For TCP connections, the kernel now displays the actual source
> > address of the connection through the "src_addr" attribute. This can
> > be used to identify not only the source address, but also on which
> > interface the connection was made (by doing an interface lookup to find out
> which one matches the source address).
> >
> > In conclusion, we have 3 choices to solve the issue of matching an
> > existing controller to a candidate configuration:
> >
> > 1) We can leave everything the way it is now. In some cases,
> > nvme-cli/libnvme may erroneously conclude that an existing controller
> > is "good enough" and not allow someone to make a different connection using
> a different "host-traddr"
> > and/or "host-iface". We may want to document that behavior so that
> > people know what to expect.
>
> Why 'errorneously'? If the user has specified 'host-traddr' and/or 'host-iface'
> we'll be selecting a controller which matches these criteria. _If_ we find several
> matching controllers it means that the search criteria are insufficient, and
> obviously the user didn't care.
Yes "erroneously" (or maybe it's a bug in nvme-cli/libnvme). To simplify, let's say that there is an existing controller where neither the host-traddr or host-iface were used to make the connection. This is the case mentioned earlier where a connection was established over the management interface at boot time. And let's say the management interface is a 1G ethernet port labelled "eth0". Now let's say we have a candidate controller that absolutely wants to use a different 10G interface "eth1" to connect to the same storage appliance. To do so, we set "host-iface=eth1".
However, the current nvme-cli/libnvme code ignores a parameter if either the candidate or the existing parameter is NULL. In this case, the existing controller's host-iface is NULL and therefore nvme-cli/libnvme will completely disregard the fact that the candidate controller really wants to use "eth1". Instead, nvme-cli/libnvme will say "hey I found this existing controller on eth0 that looks pretty good to me, therefore I'll just reuse it ".
The code in question is libnvme:__nvme_lookup_ctrl() and nvme-cli:disc_ctrl_config_match()/ctrl_config_match().
Ref libnvme: https://github.com/linux-nvme/libnvme/blob/master/src/nvme/tree.c#LL1069C13-L1069C31
Ref nvme-cli: https://github.com/linux-nvme/nvme-cli/blob/master/fabrics.c#L150
The libnvme code (below) that compares the candidate controller's "host_iface" to the existing controller's "c->cfg.host_iface" will declare a match if either of those two variables is NULL. The code below only declares "no match" when *both* parameters are not NULL and the strings pointed to by those pointer are different. In all other cases we consider that the two parameters are a match.
if (host_iface && c->cfg.host_iface &&
strcmp(c->cfg.host_iface, host_iface))
continue;
I think that we should only ignore the comparison when the candidate's host-iface is NULL. However, if the candidate is not NULL, that means that the user wants something specific and we need to dig a little deeper before declaring that the existing controller is "good enough".
In other words, when a candidate parameter is NULL, that means that the user doesn't care and therefore nvme-cli/libnvme can ignore it.
>
> > 2) We can change the code so that nvme-cli/libnvme does a "strong"
> > comparison on the "host-traddr" and/or "host-iface". This may result
> > in duplicate connections because an existing controller that was
> > configured without specifying the "host-traddr" and/or "host-iface"
> > may in fact be assigned internally the same "host-traddr" and/or
> > "host-iface" of the candidate configuration and therefore could have
> > been reused.
>
> I don't think this it a good idea.
> As outlined above, we cannot have all information for the 'connect-all'
> case, so we'll always end up creating a new controller there.
>
> > 3) For the case where an existing controller was created without
> > specifying the "host-traddr" and/or "host-iface" (and therefore the
> > kernel picked those automatically), we can look at the existing
> > controller's "src_addr" attribute to determine if there is a match
> > with the candidate config's "host-traddr" and/or "host-iface". With
> > this, we are guaranteed to never have duplicate connections and we
> > will always know when an existing connection can be reused for a
> > candidate configuration. However, this requires doing a lookup of all
> > the interfaces to find out which one matches the "src_addr"
> > (the lookup can be done through the netlink interface).
> >
> Hmm. I don't think the is a different case; it's just the opposite of case 1) (ie we
> have insufficient information to compare the existing controller with).
>
> I would shoot for having a 'lazy' check, in the sense that a non-existing setting
> host-iface or host-traddr is an automatic match (as it's not specified, so clearly
> it's not important).
> That will allow for persistent discovery controllers to work 'best', and at the
> same time to allow for the admin to override the default behaviour by specifying
> all arguments.
That's pretty much what I said above. That is, when the candidate parameter is not NULL, dig a little deeper before declaring that an existing controller is a match.
>
> Cheers,
>
> Hannes
Thanks Hannes.
More information about the Linux-nvme
mailing list