[PATCHv5 RFC 1/3] nvme-multipah: Add visibility for round-robin io-policy

Hannes Reinecke hare at suse.de
Tue Dec 10 07:18:18 PST 2024


On 12/10/24 10:22, Daniel Wagner wrote:
> There is typo in the subject.
> 
> On Wed, Oct 30, 2024 at 04:11:41PM +0530, Nilay Shroff wrote:
>> This patch helps add nvme native multipath visibility for round-robin
>> io-policy. It creates a "multipath" sysfs directory under head gendisk
>> device node directory and then from "multipath" directory it adds a link
>> to each namespace path device the head node refers.
>>
>> For instance, if we have a shared namespace accessible from two different
>> controllers/paths then we create a soft link to each path device from head
>> disk node as shown below:
>>
>> $ ls -l /sys/block/nvme1n1/multipath/
>> nvme1c1n1 -> ../../../../../pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
>> nvme1c3n1 -> ../../../../../pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1
>>
>> In the above example, nvme1n1 is head gendisk node created for a shared
>> namespace and the namespace is accessible from nvme1c1n1 and nvme1c3n1
>> paths.
>>
>> For round-robin I/O policy, we could easily infer from the above output
>> that I/O workload targeted to nvme1n1 would toggle across paths nvme1c1n1
>> and nvme1c3n1.
>>
>> Signed-off-by: Nilay Shroff <nilay at linux.ibm.com>
>> ---
>>   drivers/nvme/host/core.c      |  3 ++
>>   drivers/nvme/host/multipath.c | 81 +++++++++++++++++++++++++++++++++++
>>   drivers/nvme/host/nvme.h      | 18 ++++++--
>>   drivers/nvme/host/sysfs.c     | 14 ++++++
>>   4 files changed, 112 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 84cb859a911d..c923bd2c5468 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -3940,6 +3940,9 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>>   
>>   	if (!nvme_ns_head_multipath(ns->head))
>>   		nvme_cdev_del(&ns->cdev, &ns->cdev_device);
>> +
>> +	nvme_mpath_remove_sysfs_link(ns);
>> +
>>   	del_gendisk(ns->disk);
>>   
>>   	mutex_lock(&ns->ctrl->namespaces_lock);
>> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
>> index 6a15873055b9..43c87a946cc7 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -682,6 +682,8 @@ static void nvme_mpath_set_live(struct nvme_ns *ns)
>>   		kblockd_schedule_work(&head->partition_scan_work);
>>   	}
>>   
>> +	nvme_mpath_add_sysfs_link(ns->head);
>> +
>>   	mutex_lock(&head->lock);
>>   	if (nvme_path_is_optimized(ns)) {
>>   		int node, srcu_idx;
>> @@ -764,6 +766,25 @@ static void nvme_update_ns_ana_state(struct nvme_ana_group_desc *desc,
>>   	if (nvme_state_is_live(ns->ana_state) &&
>>   	    nvme_ctrl_state(ns->ctrl) == NVME_CTRL_LIVE)
>>   		nvme_mpath_set_live(ns);
>> +	else {
>> +		/*
>> +		 * Add sysfs link from multipath head gendisk node to path
>> +		 * device gendisk node.
>> +		 * If path's ana state is live (i.e. state is either optimized
>> +		 * or non-optimized) while we alloc the ns then sysfs link would
>> +		 * be created from nvme_mpath_set_live(). In that case we would
>> +		 * not fallthrough this code path. However for the path's ana
>> +		 * state other than live, we call nvme_mpath_set_live() only
>> +		 * after ana state transitioned to the live state. But we still
>> +		 * want to create the sysfs link from head node to a path device
>> +		 * irrespctive of the path's ana state.
>> +		 * If we reach through here then it means that path's ana state
>> +		 * is not live but still create the sysfs link to this path from
>> +		 * head node if head node of the path has already come alive.
>> +		 */
>> +		if (test_bit(NVME_NSHEAD_DISK_LIVE, &ns->head->flags))
>> +			nvme_mpath_add_sysfs_link(ns->head);
>> +	}
> 
> Remind me, why do we add the link here? Couldn't we add it earlier and
> so we don't have to check the ctrl state? At least I read this here that
> we want to add the link independent of the ctrl state.
> 
> Or do you want to say whenever the link exist the path is usable? Trying
> to figure out if this is a good idea or not when nvme-cli/libnvme
> iterates over sysfs and it races with this, what's the user experience?
> 
> I am sorry if this has been already discussed. I just completely lost
> the state :)

nvme_set_live() is the culprit.

We only present the nvme head device node to userspace once the first 
path gets live.

Which injects a bit of complexity when one wants to create the
links; we cannot create the symlink whenever a new path is registered.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



More information about the Linux-nvme mailing list