[RFC PATCH v2 0/1] Add visibility for native NVMe multipath using sysfs
Nilay Shroff
nilay at linux.ibm.com
Tue Aug 20 22:22:37 PDT 2024
Hi Keith,
>>
>> /sys/block/nvme0n1/multipatch/nvme0c0n1/
>> /sys/block/nvme0n1/multipatch/nvme0c3n1/
>>
>> And each directory has their attributes so they print exactly one value
>> instead of the multi-line output. You'd know which path the output
>> corresponds to from the file's directory.
>>
> Yes you were right, we need 1 value per file but I thought keeping
> multipath details concisely in a single file may be then easier for
> libnvme/nvme-cli to format and parse.
>
> I also read in the sysfs documentation, "attribute file should preferably
> contain only one value per file. However, it is noted that it may not be
> efficient to contain only one value per file, so it is socially acceptable
> to express an array of values of the same type."
>
> Anyways, I believe, most of us prefer 1 value per file rule for the sysfs
> attributes, so how about exporting multipath details as shown below?
>
> Let's assume namespace head node nvmeXnY points to two different paths:
> nvmeXc1nY and nvmeXc3nY.
>
> First we create the "multipath" directory under /sys/block/nvmeXnY
>
> The multipath directory would then contain two sub-directories (which are
> the two paths pointed by namespace head node) named nvmeXc1nY and nvmeXc3nY.
> In fact, these two sub-directories are actually soft link to the respective
> namespace block device under /sys/block
>
> For instance, we have a namespace head node nvme1n1 which points to two
> different paths: nvme1c1n1 and nvme1c3n1.
>
> # ls -l /sys/block/nvme1n1/multipath
> lrwxrwxrwx. 1 root root 0 Aug 11 12:30 nvme1c1n1 -> ../../../../../pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
> lrwxrwxrwx. 1 root root 0 Aug 11 12:30 nvme1c3n1 -> ../../../../../pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1
>
> # ls -l /sys/block/
> lrwxrwxrwx. 1 root root 0 Aug 11 12:30 nvme1c1n1 -> ../devices/pci052e:78/052e:78:00.0/nvme/nvme1/nvme1c1n1
> lrwxrwxrwx. 1 root root 0 Aug 11 12:30 nvme1c3n1 -> ../devices/pci058e:78/058e:78:00.0/nvme/nvme3/nvme1c3n1
>
> As we see above, /sys/block/nvme1n1/multipath/nvme1c1n1 is soft link to
> /sys/block/nvme1c1n1 and similarly /sys/block/nvme1n1/multipath/nvme1c3n1
> is soft link to /sys/block/nvme1c3n1.
>
> For round-robin I/O policy, we could easily infer from the above output
> that I/O workload targeted to nvme1n1 would toggle across nvme1c1n1 and nvme1c3n1.
>
> We also create two new sysfs attribute files named "numa_nodes" and "queue_depth"
> under /sys/block/nvmeXcYnZ.
>
> # cat /sys/block/nvme1n1/multipath/nvme1c1n1/numa_nodes
> 0-1
> # cat /sys/block/nvme1n1/multipath/nvme1c3n1/numa_nodes
> 2-3
>
> For I/O policy numa, above output signifies that I/O workload targeted to nvme1n1 and
> running on node 0 and 1 would prefer using path nvm1c1n1. Similarly, I/O workload
> targeted to nvme1n1 and running on node 2 and 3 would prefer using path nvme1c3n1.
>
> # cat /sys/block/nvme1n1/multipath/nvme1c1n1/queue_depth
> 423
> # cat /sys/block/nvme1n1/multipath/nvme1c3n1/queue_depth
> 425
>
> For I/O policy queue-depth, above output signifies that for I/O workload targeted to nvme1n1,
> we have got two paths nvme1c1n1 and nvme1c3n1 and the queue_depth for each path is 423 and 425
> respectively.
>
A gentle ping about the above proposed solution.
As you proposed about 1 file 1 attribute value, I reworked the proposal and would like to know if
you have any further feedback/comment. If this looks good then shall I resend the RFC v3 with the
above changes incorporated?
Thanks,
--Nilay
More information about the Linux-nvme
mailing list