[PATCHv4 RFC 1/1] nvme-multipath: Add sysfs attributes for showing multipath info

Hannes Reinecke hare at suse.de
Mon Oct 7 03:14:10 PDT 2024


On 9/11/24 08:26, Nilay Shroff wrote:
> NVMe native multipath supports different IO policies for selecting I/O
> path, however we don't have any visibility about which path is being
> selected by multipath code for forwarding I/O.
> This patch helps add that visibility by adding new sysfs attribute files
> named "numa_nodes" and "queue_depth" under each namespace block device
> path /sys/block/nvmeXcYnZ/. We also create a "multipath" sysfs directory
> under head disk node and then from this directory add a link to each
> namespace path device this head disk node points to.
> 
> For instance, /sys/block/nvmeXnY/multipath/ would create a soft link to
> each path the head disk node <nvmeXnY> points to:
> 
> $ ls -1 /sys/block/nvme1n1/
> nvme1c1n1 -> ../../../../../pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
> nvme1c3n1 -> ../../../../../pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1
> 
> For round-robin I/O policy, we could easily infer from the above output
> that I/O workload targeted to nvme3n1 would toggle across paths nvme1c1n1
> and nvme1c3n1.
> 
> For numa I/O policy, the "numa_nodes" attribute file shows the numa nodes
> being preferred by the respective block device path. The numa nodes value
> is comma delimited list of nodes or A-B range of nodes.
> 
> For queue-depth I/O policy, the "queue_depth" attribute file shows the
> number of active/in-flight I/O requests currently queued for each path.
> 
> Signed-off-by: Nilay Shroff <nilay at linux.ibm.com>
> ---
>   drivers/nvme/host/core.c      |  3 ++
>   drivers/nvme/host/multipath.c | 71 +++++++++++++++++++++++++++++++++++
>   drivers/nvme/host/nvme.h      | 20 ++++++++--
>   drivers/nvme/host/sysfs.c     | 20 ++++++++++
>   4 files changed, 110 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 983909a600ad..6be29fd64236 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3951,6 +3951,9 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>   
>   	if (!nvme_ns_head_multipath(ns->head))
>   		nvme_cdev_del(&ns->cdev, &ns->cdev_device);
> +
> +	nvme_mpath_remove_sysfs_link(ns);
> +
>   	del_gendisk(ns->disk);
>   
>   	mutex_lock(&ns->ctrl->namespaces_lock);
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 518e22dd4f9b..7d9c36a7a261 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -654,6 +654,8 @@ static void nvme_mpath_set_live(struct nvme_ns *ns)
>   		nvme_add_ns_head_cdev(head);
>   	}
>   
> +	nvme_mpath_add_sysfs_link(ns);
> +
>   	mutex_lock(&head->lock);
>   	if (nvme_path_is_optimized(ns)) {
>   		int node, srcu_idx;
Nearly there.

You can only call 'nvme_mpath_add_sysfs_link()' if the gendisk on the 
head had been created.

And there is one branch in nvme_mpath_add_disk():

                 if (desc.state) {
                         /* found the group desc: update */
                         nvme_update_ns_ana_state(&desc, ns);

which does not go via nvme_mpath_set_live(), yet a device link would 
need to be create here, too.
But you can't call nvme_mpath_add_sysfs_link() from 
nvme_mpath_add_disk(), as the actual gendisk might only be created
later on during ANA log parsing.

It is a tangle, and I haven't found a good way out of this.
But I am _very much_ in favour of having these links, so please
update your patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



More information about the Linux-nvme mailing list