[PATCHv5 RFC 1/3] nvme-multipah: Add visibility for round-robin io-policy
Hannes Reinecke
hare at suse.de
Tue Dec 10 07:18:18 PST 2024
On 12/10/24 10:22, Daniel Wagner wrote:
> There is typo in the subject.
>
> On Wed, Oct 30, 2024 at 04:11:41PM +0530, Nilay Shroff wrote:
>> This patch helps add nvme native multipath visibility for round-robin
>> io-policy. It creates a "multipath" sysfs directory under head gendisk
>> device node directory and then from "multipath" directory it adds a link
>> to each namespace path device the head node refers.
>>
>> For instance, if we have a shared namespace accessible from two different
>> controllers/paths then we create a soft link to each path device from head
>> disk node as shown below:
>>
>> $ ls -l /sys/block/nvme1n1/multipath/
>> nvme1c1n1 -> ../../../../../pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
>> nvme1c3n1 -> ../../../../../pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1
>>
>> In the above example, nvme1n1 is head gendisk node created for a shared
>> namespace and the namespace is accessible from nvme1c1n1 and nvme1c3n1
>> paths.
>>
>> For round-robin I/O policy, we could easily infer from the above output
>> that I/O workload targeted to nvme1n1 would toggle across paths nvme1c1n1
>> and nvme1c3n1.
>>
>> Signed-off-by: Nilay Shroff <nilay at linux.ibm.com>
>> ---
>> drivers/nvme/host/core.c | 3 ++
>> drivers/nvme/host/multipath.c | 81 +++++++++++++++++++++++++++++++++++
>> drivers/nvme/host/nvme.h | 18 ++++++--
>> drivers/nvme/host/sysfs.c | 14 ++++++
>> 4 files changed, 112 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 84cb859a911d..c923bd2c5468 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -3940,6 +3940,9 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>>
>> if (!nvme_ns_head_multipath(ns->head))
>> nvme_cdev_del(&ns->cdev, &ns->cdev_device);
>> +
>> + nvme_mpath_remove_sysfs_link(ns);
>> +
>> del_gendisk(ns->disk);
>>
>> mutex_lock(&ns->ctrl->namespaces_lock);
>> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
>> index 6a15873055b9..43c87a946cc7 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -682,6 +682,8 @@ static void nvme_mpath_set_live(struct nvme_ns *ns)
>> kblockd_schedule_work(&head->partition_scan_work);
>> }
>>
>> + nvme_mpath_add_sysfs_link(ns->head);
>> +
>> mutex_lock(&head->lock);
>> if (nvme_path_is_optimized(ns)) {
>> int node, srcu_idx;
>> @@ -764,6 +766,25 @@ static void nvme_update_ns_ana_state(struct nvme_ana_group_desc *desc,
>> if (nvme_state_is_live(ns->ana_state) &&
>> nvme_ctrl_state(ns->ctrl) == NVME_CTRL_LIVE)
>> nvme_mpath_set_live(ns);
>> + else {
>> + /*
>> + * Add sysfs link from multipath head gendisk node to path
>> + * device gendisk node.
>> + * If path's ana state is live (i.e. state is either optimized
>> + * or non-optimized) while we alloc the ns then sysfs link would
>> + * be created from nvme_mpath_set_live(). In that case we would
>> + * not fallthrough this code path. However for the path's ana
>> + * state other than live, we call nvme_mpath_set_live() only
>> + * after ana state transitioned to the live state. But we still
>> + * want to create the sysfs link from head node to a path device
>> + * irrespctive of the path's ana state.
>> + * If we reach through here then it means that path's ana state
>> + * is not live but still create the sysfs link to this path from
>> + * head node if head node of the path has already come alive.
>> + */
>> + if (test_bit(NVME_NSHEAD_DISK_LIVE, &ns->head->flags))
>> + nvme_mpath_add_sysfs_link(ns->head);
>> + }
>
> Remind me, why do we add the link here? Couldn't we add it earlier and
> so we don't have to check the ctrl state? At least I read this here that
> we want to add the link independent of the ctrl state.
>
> Or do you want to say whenever the link exist the path is usable? Trying
> to figure out if this is a good idea or not when nvme-cli/libnvme
> iterates over sysfs and it races with this, what's the user experience?
>
> I am sorry if this has been already discussed. I just completely lost
> the state :)
nvme_set_live() is the culprit.
We only present the nvme head device node to userspace once the first
path gets live.
Which injects a bit of complexity when one wants to create the
links; we cannot create the symlink whenever a new path is registered.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
More information about the Linux-nvme
mailing list