[PATCHv4 RFC 1/1] nvme-multipath: Add sysfs attributes for showing multipath info
Nilay Shroff
nilay at linux.ibm.com
Mon Oct 7 06:47:30 PDT 2024
On 10/7/24 15:44, Hannes Reinecke wrote:
> On 9/11/24 08:26, Nilay Shroff wrote:
>> NVMe native multipath supports different IO policies for selecting I/O
>> path, however we don't have any visibility about which path is being
>> selected by multipath code for forwarding I/O.
>> This patch helps add that visibility by adding new sysfs attribute files
>> named "numa_nodes" and "queue_depth" under each namespace block device
>> path /sys/block/nvmeXcYnZ/. We also create a "multipath" sysfs directory
>> under head disk node and then from this directory add a link to each
>> namespace path device this head disk node points to.
>>
>> For instance, /sys/block/nvmeXnY/multipath/ would create a soft link to
>> each path the head disk node <nvmeXnY> points to:
>>
>> $ ls -1 /sys/block/nvme1n1/
>> nvme1c1n1 -> ../../../../../pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
>> nvme1c3n1 -> ../../../../../pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1
>>
>> For round-robin I/O policy, we could easily infer from the above output
>> that I/O workload targeted to nvme3n1 would toggle across paths nvme1c1n1
>> and nvme1c3n1.
>>
>> For numa I/O policy, the "numa_nodes" attribute file shows the numa nodes
>> being preferred by the respective block device path. The numa nodes value
>> is comma delimited list of nodes or A-B range of nodes.
>>
>> For queue-depth I/O policy, the "queue_depth" attribute file shows the
>> number of active/in-flight I/O requests currently queued for each path.
>>
>> Signed-off-by: Nilay Shroff <nilay at linux.ibm.com>
>> ---
>> drivers/nvme/host/core.c | 3 ++
>> drivers/nvme/host/multipath.c | 71 +++++++++++++++++++++++++++++++++++
>> drivers/nvme/host/nvme.h | 20 ++++++++--
>> drivers/nvme/host/sysfs.c | 20 ++++++++++
>> 4 files changed, 110 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 983909a600ad..6be29fd64236 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -3951,6 +3951,9 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>> if (!nvme_ns_head_multipath(ns->head))
>> nvme_cdev_del(&ns->cdev, &ns->cdev_device);
>> +
>> + nvme_mpath_remove_sysfs_link(ns);
>> +
>> del_gendisk(ns->disk);
>> mutex_lock(&ns->ctrl->namespaces_lock);
>> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
>> index 518e22dd4f9b..7d9c36a7a261 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -654,6 +654,8 @@ static void nvme_mpath_set_live(struct nvme_ns *ns)
>> nvme_add_ns_head_cdev(head);
>> }
>> + nvme_mpath_add_sysfs_link(ns);
>> +
>> mutex_lock(&head->lock);
>> if (nvme_path_is_optimized(ns)) {
>> int node, srcu_idx;
> Nearly there.
Thank you for your review comments!
>
> You can only call 'nvme_mpath_add_sysfs_link()' if the gendisk on the head had been created.
>
> And there is one branch in nvme_mpath_add_disk():
>
> if (desc.state) {
> /* found the group desc: update */
> nvme_update_ns_ana_state(&desc, ns);
>
> which does not go via nvme_mpath_set_live(), yet a device link would need to be create here, too.
> But you can't call nvme_mpath_add_sysfs_link() from nvme_mpath_add_disk(), as the actual gendisk might only be created
> later on during ANA log parsing.
>
> It is a tangle, and I haven't found a good way out of this.
> But I am _very much_ in favour of having these links, so please
> update your patch.
> In case disk supports ANA group then yes it would go through nvme_mpath_add_disk()->nvme_update_ns_ana_state();
and later nvme_update_ns_ana_state() would also fall through function nvme_mpath_set_live where we call
nvme_mpath_add_sysfs_link().
So I think that in any case while multipath namespace is being created it has to go through
nvme_mpath_set_live function. And as we see in nvme_mpath_set_live function, we only create
sysfs link after the gendisk on the head is created. Do you agree with this? Or please let
me know if you have any further question.
> Cheers,
>
> Hannes
Thanks,
--Nilay
More information about the Linux-nvme
mailing list