[RFC PATCH v2 0/1] Add visibility for native NVMe multipath using sysfs

Nilay Shroff nilay at linux.ibm.com
Wed Aug 28 07:50:40 PDT 2024


Hi Keith,

A gentle ping for this RFC. It would be really helpful if you would share your feedback/comment.

Thanks,
--Nilay

On 8/21/24 10:52, Nilay Shroff wrote:
> Hi Keith,
> 
>>>
>>>  /sys/block/nvme0n1/multipatch/nvme0c0n1/
>>>  /sys/block/nvme0n1/multipatch/nvme0c3n1/
>>>
>>> And each directory has their attributes so they print exactly one value
>>> instead of the multi-line output. You'd know which path the output
>>> corresponds to from the file's directory.
>>>
>> Yes you were right, we need 1 value per file but I thought keeping 
>> multipath details concisely in a single file may be then easier for 
>> libnvme/nvme-cli to format and parse. 
>>
>> I also read in the sysfs documentation, "attribute file should preferably 
>> contain only one value per file. However, it is noted that it may not be 
>> efficient to contain only one value per file, so it is socially acceptable
>> to express an array of values of the same type." 
>>
>> Anyways, I believe, most of us prefer 1 value per file rule for the sysfs 
>> attributes, so how about exporting multipath details as shown below?
>>
>> Let's assume namespace head node nvmeXnY points to two different paths: 
>> nvmeXc1nY and nvmeXc3nY.
>>
>> First we create the "multipath" directory under /sys/block/nvmeXnY 
>>
>> The multipath directory would then contain two sub-directories (which are
>> the two paths pointed by namespace head node)  named nvmeXc1nY and nvmeXc3nY. 
>> In fact, these two sub-directories are actually soft link to the respective 
>> namespace block device under /sys/block
>>
>> For instance, we have a namespace head node nvme1n1 which points to two 
>> different paths: nvme1c1n1 and nvme1c3n1.
>>
>> # ls -l /sys/block/nvme1n1/multipath
>> lrwxrwxrwx. 1 root root     0 Aug 11 12:30 nvme1c1n1 -> ../../../../../pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
>> lrwxrwxrwx. 1 root root     0 Aug 11 12:30 nvme1c3n1 -> ../../../../../pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1
>>
>> # ls -l /sys/block/
>> lrwxrwxrwx. 1 root root 0 Aug 11 12:30 nvme1c1n1 -> ../devices/pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
>> lrwxrwxrwx. 1 root root 0 Aug 11 12:30 nvme1c3n1 -> ../devices/pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1
>>
>> As we see above, /sys/block/nvme1n1/multipath/nvme1c1n1 is soft link to 
>> /sys/block/nvme1c1n1 and similarly /sys/block/nvme1n1/multipath/nvme1c3n1
>> is soft link to /sys/block/nvme1c3n1. 
>>
>> For round-robin I/O policy, we could easily infer from the above output 
>> that I/O workload targeted to nvme1n1 would toggle across nvme1c1n1 and nvme1c3n1.
>>
>> We also create two new sysfs attribute files named "numa_nodes" and "queue_depth" 
>> under /sys/block/nvmeXcYnZ. 
>>
>> # cat  /sys/block/nvme1n1/multipath/nvme1c1n1/numa_nodes 
>> 0-1
>> # cat  /sys/block/nvme1n1/multipath/nvme1c3n1/numa_nodes 
>> 2-3
>>
>> For I/O policy numa, above output signifies that I/O workload targeted to nvme1n1 and
>> running on node 0 and 1 would prefer using path nvm1c1n1. Similarly, I/O workload 
>> targeted to nvme1n1 and running on node 2 and 3 would prefer using path nvme1c3n1.
>>
>> # cat /sys/block/nvme1n1/multipath/nvme1c1n1/queue_depth
>> 423
>> # cat /sys/block/nvme1n1/multipath/nvme1c3n1/queue_depth
>> 425
>>
>> For I/O policy queue-depth, above output signifies that for I/O workload targeted to nvme1n1, 
>> we have got two paths nvme1c1n1 and nvme1c3n1 and the queue_depth for each path is 423 and 425 
>> respectively.
>>
> 
> A gentle ping about the above proposed solution.
>  
> As you proposed about 1 file 1 attribute value, I reworked the proposal and would like to know if
> you have any further feedback/comment. If this looks good then shall I resend the RFC v3 with the 
> above changes incorporated?
>  
> Thanks,
> --Nilay



More information about the Linux-nvme mailing list