[RFC PATCH v2 0/1] Add visibility for native NVMe multipath using sysfs

Nilay Shroff nilay at linux.ibm.com
Sun Aug 11 11:51:28 PDT 2024



On 8/10/24 02:04, Keith Busch wrote:
> On Fri, Aug 09, 2024 at 10:59:56PM +0530, Nilay Shroff wrote:
>> # cat /sys/block/nvme0n1/multipath/queue_depth 
>> nvme0c0n1 423
>> nvme0c3n1 425
> 
> I am not sure this matches up to the desire for 1 value per file. What I
> thought we wanted was a directory for each path, so it'd look something
> like:
> 
>  /sys/block/nvme0n1/multipatch/nvme0c0n1/
>  /sys/block/nvme0n1/multipatch/nvme0c3n1/
> 
> And each directory has their attributes so they print exactly one value
> instead of the multi-line output. You'd know which path the output
> corresponds to from the file's directory.
> 
Yes you were right, we need 1 value per file but I thought keeping 
multipath details concisely in a single file may be then easier for 
libnvme/nvme-cli to format and parse. 

I also read in the sysfs documentation, "attribute file should preferably 
contain only one value per file. However, it is noted that it may not be 
efficient to contain only one value per file, so it is socially acceptable
to express an array of values of the same type." 

Anyways, I believe, most of us prefer 1 value per file rule for the sysfs 
attributes, so how about exporting multipath details as shown below?

Let's assume namespace head node nvmeXnY points to two different paths: 
nvmeXc1nY and nvmeXc3nY.

First we create the "multipath" directory under /sys/block/nvmeXnY 

The multipath directory would then contain two sub-directories (which are
the two paths pointed by namespace head node)  named nvmeXc1nY and nvmeXc3nY. 
In fact, these two sub-directories are actually soft link to the respective 
namespace block device under /sys/block

For instance, we have a namespace head node nvme1n1 which points to two 
different paths: nvme1c1n1 and nvme1c3n1.

# ls -l /sys/block/nvme1n1/multipath
lrwxrwxrwx. 1 root root     0 Aug 11 12:30 nvme1c1n1 -> ../../../../../pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
lrwxrwxrwx. 1 root root     0 Aug 11 12:30 nvme1c3n1 -> ../../../../../pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1

# ls -l /sys/block/
lrwxrwxrwx. 1 root root 0 Aug 11 12:30 nvme1c1n1 -> ../devices/pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
lrwxrwxrwx. 1 root root 0 Aug 11 12:30 nvme1c3n1 -> ../devices/pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1

As we see above, /sys/block/nvme1n1/multipath/nvme1c1n1 is soft link to 
/sys/block/nvme1c1n1 and similarly /sys/block/nvme1n1/multipath/nvme1c3n1
is soft link to /sys/block/nvme1c3n1. 

For round-robin I/O policy, we could easily infer from the above output 
that I/O workload targeted to nvme1n1 would toggle across nvme1c1n1 and nvme1c3n1.

We also create two new sysfs attribute files named "numa_nodes" and "queue_depth" 
under /sys/block/nvmeXcYnZ. 

# cat  /sys/block/nvme1n1/multipath/nvme1c1n1/numa_nodes 
0-1
# cat  /sys/block/nvme1n1/multipath/nvme1c3n1/numa_nodes 
2-3

For I/O policy numa, above output signifies that I/O workload targeted to nvme1n1 and
running on node 0 and 1 would prefer using path nvm1c1n1. Similarly, I/O workload 
targeted to nvme1n1 and running on node 2 and 3 would prefer using path nvme1c3n1.

# cat /sys/block/nvme1n1/multipath/nvme1c1n1/queue_depth
423
# cat /sys/block/nvme1n1/multipath/nvme1c3n1/queue_depth
425

For I/O policy queue-depth, above output signifies that for I/O workload targeted to nvme1n1, 
we have got two paths nvme1c1n1 and nvme1c3n1 and the queue_depth for each path is 423 and 425 
respectively.

Thanks,
--Nilay




















More information about the Linux-nvme mailing list