[PATCHv5 RFC 1/3] nvme-multipah: Add visibility for round-robin io-policy

Daniel Wagner dwagner at suse.de
Tue Dec 10 01:22:27 PST 2024


There is typo in the subject.

On Wed, Oct 30, 2024 at 04:11:41PM +0530, Nilay Shroff wrote:
> This patch helps add nvme native multipath visibility for round-robin
> io-policy. It creates a "multipath" sysfs directory under head gendisk
> device node directory and then from "multipath" directory it adds a link
> to each namespace path device the head node refers.
> 
> For instance, if we have a shared namespace accessible from two different
> controllers/paths then we create a soft link to each path device from head
> disk node as shown below:
> 
> $ ls -l /sys/block/nvme1n1/multipath/
> nvme1c1n1 -> ../../../../../pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
> nvme1c3n1 -> ../../../../../pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1
> 
> In the above example, nvme1n1 is head gendisk node created for a shared
> namespace and the namespace is accessible from nvme1c1n1 and nvme1c3n1
> paths.
> 
> For round-robin I/O policy, we could easily infer from the above output
> that I/O workload targeted to nvme1n1 would toggle across paths nvme1c1n1
> and nvme1c3n1.
> 
> Signed-off-by: Nilay Shroff <nilay at linux.ibm.com>
> ---
>  drivers/nvme/host/core.c      |  3 ++
>  drivers/nvme/host/multipath.c | 81 +++++++++++++++++++++++++++++++++++
>  drivers/nvme/host/nvme.h      | 18 ++++++--
>  drivers/nvme/host/sysfs.c     | 14 ++++++
>  4 files changed, 112 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 84cb859a911d..c923bd2c5468 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3940,6 +3940,9 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>  
>  	if (!nvme_ns_head_multipath(ns->head))
>  		nvme_cdev_del(&ns->cdev, &ns->cdev_device);
> +
> +	nvme_mpath_remove_sysfs_link(ns);
> +
>  	del_gendisk(ns->disk);
>  
>  	mutex_lock(&ns->ctrl->namespaces_lock);
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 6a15873055b9..43c87a946cc7 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -682,6 +682,8 @@ static void nvme_mpath_set_live(struct nvme_ns *ns)
>  		kblockd_schedule_work(&head->partition_scan_work);
>  	}
>  
> +	nvme_mpath_add_sysfs_link(ns->head);
> +
>  	mutex_lock(&head->lock);
>  	if (nvme_path_is_optimized(ns)) {
>  		int node, srcu_idx;
> @@ -764,6 +766,25 @@ static void nvme_update_ns_ana_state(struct nvme_ana_group_desc *desc,
>  	if (nvme_state_is_live(ns->ana_state) &&
>  	    nvme_ctrl_state(ns->ctrl) == NVME_CTRL_LIVE)
>  		nvme_mpath_set_live(ns);
> +	else {
> +		/*
> +		 * Add sysfs link from multipath head gendisk node to path
> +		 * device gendisk node.
> +		 * If path's ana state is live (i.e. state is either optimized
> +		 * or non-optimized) while we alloc the ns then sysfs link would
> +		 * be created from nvme_mpath_set_live(). In that case we would
> +		 * not fallthrough this code path. However for the path's ana
> +		 * state other than live, we call nvme_mpath_set_live() only
> +		 * after ana state transitioned to the live state. But we still
> +		 * want to create the sysfs link from head node to a path device
> +		 * irrespctive of the path's ana state.
> +		 * If we reach through here then it means that path's ana state
> +		 * is not live but still create the sysfs link to this path from
> +		 * head node if head node of the path has already come alive.
> +		 */
> +		if (test_bit(NVME_NSHEAD_DISK_LIVE, &ns->head->flags))
> +			nvme_mpath_add_sysfs_link(ns->head);
> +	}

Remind me, why do we add the link here? Couldn't we add it earlier and
so we don't have to check the ctrl state? At least I read this here that
we want to add the link independent of the ctrl state.

Or do you want to say whenever the link exist the path is usable? Trying
to figure out if this is a good idea or not when nvme-cli/libnvme
iterates over sysfs and it races with this, what's the user experience?

I am sorry if this has been already discussed. I just completely lost
the state :)



More information about the Linux-nvme mailing list