[RFC PATCH v2 0/1] Add visibility for native NVMe multipath using sysfs

Nilay Shroff nilay at linux.ibm.com
Tue Aug 20 22:22:37 PDT 2024


Hi Keith,

>>
>>  /sys/block/nvme0n1/multipatch/nvme0c0n1/
>>  /sys/block/nvme0n1/multipatch/nvme0c3n1/
>>
>> And each directory has their attributes so they print exactly one value
>> instead of the multi-line output. You'd know which path the output
>> corresponds to from the file's directory.
>>
> Yes you were right, we need 1 value per file but I thought keeping 
> multipath details concisely in a single file may be then easier for 
> libnvme/nvme-cli to format and parse. 
> 
> I also read in the sysfs documentation, "attribute file should preferably 
> contain only one value per file. However, it is noted that it may not be 
> efficient to contain only one value per file, so it is socially acceptable
> to express an array of values of the same type." 
> 
> Anyways, I believe, most of us prefer 1 value per file rule for the sysfs 
> attributes, so how about exporting multipath details as shown below?
> 
> Let's assume namespace head node nvmeXnY points to two different paths: 
> nvmeXc1nY and nvmeXc3nY.
> 
> First we create the "multipath" directory under /sys/block/nvmeXnY 
> 
> The multipath directory would then contain two sub-directories (which are
> the two paths pointed by namespace head node)  named nvmeXc1nY and nvmeXc3nY. 
> In fact, these two sub-directories are actually soft link to the respective 
> namespace block device under /sys/block
> 
> For instance, we have a namespace head node nvme1n1 which points to two 
> different paths: nvme1c1n1 and nvme1c3n1.
> 
> # ls -l /sys/block/nvme1n1/multipath
> lrwxrwxrwx. 1 root root     0 Aug 11 12:30 nvme1c1n1 -> ../../../../../pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
> lrwxrwxrwx. 1 root root     0 Aug 11 12:30 nvme1c3n1 -> ../../../../../pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1
> 
> # ls -l /sys/block/
> lrwxrwxrwx. 1 root root 0 Aug 11 12:30 nvme1c1n1 -> ../devices/pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
> lrwxrwxrwx. 1 root root 0 Aug 11 12:30 nvme1c3n1 -> ../devices/pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1
> 
> As we see above, /sys/block/nvme1n1/multipath/nvme1c1n1 is soft link to 
> /sys/block/nvme1c1n1 and similarly /sys/block/nvme1n1/multipath/nvme1c3n1
> is soft link to /sys/block/nvme1c3n1. 
> 
> For round-robin I/O policy, we could easily infer from the above output 
> that I/O workload targeted to nvme1n1 would toggle across nvme1c1n1 and nvme1c3n1.
> 
> We also create two new sysfs attribute files named "numa_nodes" and "queue_depth" 
> under /sys/block/nvmeXcYnZ. 
> 
> # cat  /sys/block/nvme1n1/multipath/nvme1c1n1/numa_nodes 
> 0-1
> # cat  /sys/block/nvme1n1/multipath/nvme1c3n1/numa_nodes 
> 2-3
> 
> For I/O policy numa, above output signifies that I/O workload targeted to nvme1n1 and
> running on node 0 and 1 would prefer using path nvm1c1n1. Similarly, I/O workload 
> targeted to nvme1n1 and running on node 2 and 3 would prefer using path nvme1c3n1.
> 
> # cat /sys/block/nvme1n1/multipath/nvme1c1n1/queue_depth
> 423
> # cat /sys/block/nvme1n1/multipath/nvme1c3n1/queue_depth
> 425
> 
> For I/O policy queue-depth, above output signifies that for I/O workload targeted to nvme1n1, 
> we have got two paths nvme1c1n1 and nvme1c3n1 and the queue_depth for each path is 423 and 425 
> respectively.
>

A gentle ping about the above proposed solution.
 
As you proposed about 1 file 1 attribute value, I reworked the proposal and would like to know if
you have any further feedback/comment. If this looks good then shall I resend the RFC v3 with the 
above changes incorporated?
 
Thanks,
--Nilay



More information about the Linux-nvme mailing list