nvme-cli/libnvme: Controller reuse policy

Belanger, Martin Martin.Belanger at dell.com
Tue Jun 6 12:49:49 PDT 2023


I came across an issue using nvme-cli/libnvme. It was suggested that I move parts of the discussion to this mailing list.

The relevant info can be found in the libnvme repo here:
https://github.com/linux-nvme/libnvme/pull/647
https://github.com/linux-nvme/libnvme/issues/648

My question is related to nvme-oF-TCP, but may apply to other transports as well (I'm just not an expert for those other transports). 

The question is: How should nvme-cli/libnvme compare existing controllers (in sysfs) to a candidate configuration to determine whether to create a new controller or to reuse an existing one?

Background:

The configuration parameters to create a controller are:
1. --transport
2. --traddr
3. --trsvcid
4. --nqn
5. --host-traddr
6. --host-iface (TCP only)

When trying to configure a controller, nvme-cli/libnvme scans the sysfs looking for an existing controller that matches the candidate config. If a match is found, nvme-cli/libnvme reuses the existing controller instead of creating a new one. nvme-cli/libnvme compares the 6 configuration parameters listed above one-by-one to determine if an existing controller matches a candidate config. Some of these parameters like "transport" and "traddr" can never be NULL. Others, such as "host-traddr" and "host-iface" (and maybe others), may be NULL when not needed.

Currently, nvme-cli/libnvme does a "soft" comparison for "host-traddr" and "host-iface". That is to say that if an existing controller was created without specifying one or both parameters (i.e. set to NULL), then the NULL parameter will be ignored when comparing against the candidate config. This may result in nvme-cli/libnvme reusing an existing controller that does not match the candidate "host-traddr" and/or "host-iface".

Why is this important?

We now have support for booting systems over nvme-oF-TCP. What typically happens during early boot is that a controller may be created over the system's management interface (i.e. the default route). The management interface is typically a slower interface (e.g. 1G), which is fine for booting a system. However, once the system is fully booted we may want to use a different (faster) interface (e.g. 10G) to connect to that same subsystem. We can force the new connection on a different interface using the "host-iface" parameter. However, because nvme-cli/libnvme performs a "soft" compare, it will conclude that the existing (slower) controller is "good enough" and reuse it instead of creating a new connection. 

One more piece of info (to make things more interesting):

When a controller is created without specifying "host-traddr" and "host-iface", the kernel will decide which source-address and interface to use for that connection. The interface is selected by looking up the destination address (traddr) in the routing table. The source address is selected by retrieving the primary address assigned to that interface. It is possible that the values picked by the kernel will match the host-iface/host-traddr of a candidate config. However, because they are not exposed in the sysfs, the comparison will conclude that none of the existing controllers match the candidate config (when in fact there is a match). This is one of the reasons why I added the "src_addr" parameter to the sysfs (kernel 6.1). For TCP connections, the kernel now displays the actual source address of the connection through the "src_addr" attribute. This can be used to identify not only the source address, but also on which interface the connection was made (by doing an interface lookup to find out which one matches the source address). 

In conclusion, we have 3 choices to solve the issue of matching an existing controller to a candidate configuration:

1) We can leave everything the way it is now. In some cases, nvme-cli/libnvme may erroneously conclude that an existing controller is "good enough" and not allow someone to make a different connection using a different "host-traddr" and/or "host-iface". We may want to document that behavior so that people know what to expect.
2) We can change the code so that nvme-cli/libnvme does a "strong" comparison on the "host-traddr" and/or "host-iface". This may result in duplicate connections because an existing controller that was configured without specifying the "host-traddr" and/or "host-iface" may in fact be assigned internally  the same "host-traddr" and/or "host-iface" of the candidate configuration and therefore could have been reused.
3) For the case where an existing controller was created without specifying the "host-traddr" and/or "host-iface" (and therefore the kernel picked those automatically), we can look at the existing controller's "src_addr" attribute to determine if there is a match with the candidate config's "host-traddr" and/or "host-iface". With this, we are guaranteed to never have duplicate connections and we will always know when an existing connection can be reused for a candidate configuration. However, this requires doing a lookup of all the interfaces to find out which one matches the "src_addr" (the lookup can be done through the netlink interface).

What are your thoughts?

Best regards,
Martin Belanger
Dell Technologies, Inc.



More information about the Linux-nvme mailing list