[PATCH 3/3] tree: add attribute numa_nodes for NVMe path object
Nilay Shroff
nilay at linux.ibm.com
Mon Apr 7 07:19:41 PDT 2025
On 4/7/25 4:40 PM, Daniel Wagner wrote:
> On Mon, Apr 07, 2025 at 03:29:53PM +0530, Nilay Shroff wrote:
>> On 4/7/25 1:14 PM, Daniel Wagner wrote:
>>> On Sat, Apr 05, 2025 at 06:32:49PM +0530, Nilay Shroff wrote:
>>>> Add a new attribute named "numa_nodes" under the NVMe path object. This
>>>> attribute is used by the iopolicy "numa". The numa_nodes value is stored
>>>> for each NVMe path and represents the NUMA node(s) associated with it.
>>>> When the iopolicy is set to "numa", I/O traffic originating from a given
>>>> NUMA node will be forwarded through the corresponding NVMe path.
>>>>
>>>> The numa_nodes attribute is useful for observing which NVMe path the
>>>> kernel would choose for I/O forwarding based on NUMA affinity. To support
>>>> this, export the attribute in libnvme.map so it can be accessed via
>>>> nvme-cli.
>>>
>>> This one has the same limitation as the previous one. Given that libnvme
>>> currently caches everything, we could just accept this limitation for
>>> the time being. Any thoughts on this?
>>
>> Yes agreed. So how about adding a new API, for instance,
>> nvme_path_get_numa_nodes__no_cached which would not return
>> the cached value but instead re-read the latest value from
>> sysfs attribute and return the latest value? We may similarly
>> extend other APIs where we don't want to retrieve cached
>> value.
>
> Adding _no_cached function is certainty an option for libnvme 1.x.
>
> One of the API changes for the next major version of libnvme (aka 2.x)
> is to add an handle to all functions. Currently, we only have it for the
> fabrics API. If such handle is available, we could add no-cache flag
> instead duplicating all functions.
>
> Maybe adding an explicit flags argument for 1.x is an option or should
> we just keep going with the cached only approach?
Both approaches — either adding a no-cache flag or introducing dedicated
__no_cached APIs — would work. However, in my opinion, we should aim to
use a consistent method across both libnvme 1.x and 2.x versions.
As we discussed during LSFMM, if we plan to implement an "nvme top" command,
we would need non-cached versions of these APIs even for nvme-cli. So,
using the same mechanism for both versions makes sense. Otherwise, we’d
also have to maintain different logic in nvme top depending on the libnvme
version, which adds unnecessary complexity.
Thanks,
--Nilay
More information about the Linux-nvme
mailing list