[RFC PATCH v2 0/1] Add visibility for native NVMe multipath using sysfs
Nilay Shroff
nilay at linux.ibm.com
Wed Aug 28 07:50:40 PDT 2024
Hi Keith,
A gentle ping for this RFC. It would be really helpful if you would share your feedback/comment.
Thanks,
--Nilay
On 8/21/24 10:52, Nilay Shroff wrote:
> Hi Keith,
>
>>>
>>> /sys/block/nvme0n1/multipatch/nvme0c0n1/
>>> /sys/block/nvme0n1/multipatch/nvme0c3n1/
>>>
>>> And each directory has their attributes so they print exactly one value
>>> instead of the multi-line output. You'd know which path the output
>>> corresponds to from the file's directory.
>>>
>> Yes you were right, we need 1 value per file but I thought keeping
>> multipath details concisely in a single file may be then easier for
>> libnvme/nvme-cli to format and parse.
>>
>> I also read in the sysfs documentation, "attribute file should preferably
>> contain only one value per file. However, it is noted that it may not be
>> efficient to contain only one value per file, so it is socially acceptable
>> to express an array of values of the same type."
>>
>> Anyways, I believe, most of us prefer 1 value per file rule for the sysfs
>> attributes, so how about exporting multipath details as shown below?
>>
>> Let's assume namespace head node nvmeXnY points to two different paths:
>> nvmeXc1nY and nvmeXc3nY.
>>
>> First we create the "multipath" directory under /sys/block/nvmeXnY
>>
>> The multipath directory would then contain two sub-directories (which are
>> the two paths pointed by namespace head node) named nvmeXc1nY and nvmeXc3nY.
>> In fact, these two sub-directories are actually soft link to the respective
>> namespace block device under /sys/block
>>
>> For instance, we have a namespace head node nvme1n1 which points to two
>> different paths: nvme1c1n1 and nvme1c3n1.
>>
>> # ls -l /sys/block/nvme1n1/multipath
>> lrwxrwxrwx. 1 root root 0 Aug 11 12:30 nvme1c1n1 -> ../../../../../pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
>> lrwxrwxrwx. 1 root root 0 Aug 11 12:30 nvme1c3n1 -> ../../../../../pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1
>>
>> # ls -l /sys/block/
>> lrwxrwxrwx. 1 root root 0 Aug 11 12:30 nvme1c1n1 -> ../devices/pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
>> lrwxrwxrwx. 1 root root 0 Aug 11 12:30 nvme1c3n1 -> ../devices/pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1
>>
>> As we see above, /sys/block/nvme1n1/multipath/nvme1c1n1 is soft link to
>> /sys/block/nvme1c1n1 and similarly /sys/block/nvme1n1/multipath/nvme1c3n1
>> is soft link to /sys/block/nvme1c3n1.
>>
>> For round-robin I/O policy, we could easily infer from the above output
>> that I/O workload targeted to nvme1n1 would toggle across nvme1c1n1 and nvme1c3n1.
>>
>> We also create two new sysfs attribute files named "numa_nodes" and "queue_depth"
>> under /sys/block/nvmeXcYnZ.
>>
>> # cat /sys/block/nvme1n1/multipath/nvme1c1n1/numa_nodes
>> 0-1
>> # cat /sys/block/nvme1n1/multipath/nvme1c3n1/numa_nodes
>> 2-3
>>
>> For I/O policy numa, above output signifies that I/O workload targeted to nvme1n1 and
>> running on node 0 and 1 would prefer using path nvm1c1n1. Similarly, I/O workload
>> targeted to nvme1n1 and running on node 2 and 3 would prefer using path nvme1c3n1.
>>
>> # cat /sys/block/nvme1n1/multipath/nvme1c1n1/queue_depth
>> 423
>> # cat /sys/block/nvme1n1/multipath/nvme1c3n1/queue_depth
>> 425
>>
>> For I/O policy queue-depth, above output signifies that for I/O workload targeted to nvme1n1,
>> we have got two paths nvme1c1n1 and nvme1c3n1 and the queue_depth for each path is 423 and 425
>> respectively.
>>
>
> A gentle ping about the above proposed solution.
>
> As you proposed about 1 file 1 attribute value, I reworked the proposal and would like to know if
> you have any further feedback/comment. If this looks good then shall I resend the RFC v3 with the
> above changes incorporated?
>
> Thanks,
> --Nilay
More information about the Linux-nvme
mailing list