[PATCHv3 RFC 1/1] nvme-multipath: Add sysfs attributes for showing multipath info

Daniel Wagner dwagner at suse.de
Mon Sep 9 05:40:44 PDT 2024


On Tue, Sep 03, 2024 at 07:22:19PM GMT, Nilay Shroff wrote:
> NVMe native multipath supports different IO policies for selecting I/O
> path, however we don't have any visibility about which path is being
> selected by multipath code for forwarding I/O.
> This patch helps add that visibility by adding new sysfs attribute files
> named "numa_nodes" and "queue_depth" under each namespace block device
> path /sys/block/nvmeXcYnZ/. We also create a "multipath" sysfs directory
> under head disk node and then from this directory add a link to each
> namespace path device this head disk node points to.
> 
> For instance, /sys/block/nvmeXnY/multipath/ would create a soft link to
> each path the head disk node <nvmeXnY> points to:
> 
> $ ls -1 /sys/block/nvme1n1/
> nvme1c1n1 -> ../../../../../pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
> nvme1c3n1 -> ../../../../../pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1
> 
> For round-robin I/O policy, we could easily infer from the above output
> that I/O workload targeted to nvme3n1 would toggle across paths nvme1c1n1
> and nvme1c3n1.
> 
> For numa I/O policy, the "numa_nodes" attribute file shows the numa nodes
> being preferred by the respective block device path. The numa nodes value
> is comma delimited list of nodes or A-B range of nodes.
> 
> For queue-depth I/O policy, the "queue_depth" attribute file shows the
> number of active/in-flight I/O requests currently queued for each path.

As far I can tell, this looks good to me.

> +static ssize_t numa_nodes_show(struct device *dev, struct device_attribute *attr,
> +		char *buf)
> +{
> +	int node;
> +	nodemask_t numa_nodes;
> +	struct nvme_ns *current_ns;
> +	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
> +	struct nvme_ns_head *head = ns->head;
> +
> +	nodes_clear(numa_nodes);
> +
> +	for_each_node(node) {
> +		current_ns = srcu_dereference(head->current_path[node],
> +				&head->srcu);

Don't you need to use srcu_read_lock() first?

> +		if (ns == current_ns)
> +			node_set(node, numa_nodes);

And if ns matches current_ns can't you break the loop?

Thanks,
Daniel



More information about the Linux-nvme mailing list