[PATCH 4/5] nvme: implement multipath access to nvme subsystems
Hannes Reinecke
hare at suse.de
Thu Nov 9 07:44:32 PST 2017
On 11/02/2017 07:30 PM, Christoph Hellwig wrote:
> This patch adds native multipath support to the nvme driver. For each
> namespace we create only single block device node, which can be used
> to access that namespace through any of the controllers that refer to it.
> The gendisk for each controllers path to the name space still exists
> inside the kernel, but is hidden from userspace. The character device
> nodes are still available on a per-controller basis. A new link from
> the sysfs directory for the subsystem allows to find all controllers
> for a given subsystem.
>
> Currently we will always send I/O to the first available path, this will
> be changed once the NVMe Asynchronous Namespace Access (ANA) TP is
> ratified and implemented, at which point we will look at the ANA state
> for each namespace. Another possibility that was prototyped is to
> use the path that is closes to the submitting NUMA code, which will be
> mostly interesting for PCI, but might also be useful for RDMA or FC
> transports in the future. There is not plan to implement round robin
> or I/O service time path selectors, as those are not scalable with
> the performance rates provided by NVMe.
>
> The multipath device will go away once all paths to it disappear,
> any delay to keep it alive needs to be implemented at the controller
> level.
>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
> drivers/nvme/host/Kconfig | 9 ++
> drivers/nvme/host/Makefile | 1 +
> drivers/nvme/host/core.c | 133 +++++++++++++++++++---
> drivers/nvme/host/multipath.c | 255 ++++++++++++++++++++++++++++++++++++++++++
> drivers/nvme/host/nvme.h | 57 ++++++++++
> 5 files changed, 440 insertions(+), 15 deletions(-)
> create mode 100644 drivers/nvme/host/multipath.c
>
In general I'm okay with this approach, but would like to address two
things:
- We don't have the topology information in sysfs; while the namespace
device has the 'slaves' and 'holders' directories, they remain empty,
and the path devices don't even have those directories. I really would
like to see them populated to help things like dracut figuring out the
topology when building up a list of modules to include.
- The patch doesn't integrate with the 'claim' mechanism for block
devices, ie device-mapper might accidentally stumble upon it when
traversing devices.
I'll be sending two patches to resurrect the 'bd_link_disk_holder'
idea I posted earlier; that should take care of these issues.
If you're totally against having to access the block device I might be
willing to look into breaking things out, so that the nvme code just
creates the symlinks and the block-device claiming code honours the
'HIDDEN' flag.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare at suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
More information about the Linux-nvme
mailing list