[PATCHv2 2/4] nvme: extend show-topology command to add support for multipath

Tue Sep 2 21:22:13 PDT 2025

On 9/2/25 11:56 AM, Hannes Reinecke wrote:
> On 9/1/25 18:36, Daniel Wagner wrote:
>> Hi Nilay,
>>
>> On Mon, Sep 01, 2025 at 02:51:09PM +0530, Nilay Shroff wrote:
>>> Hi Daniel and Hannes,
>>>
>>> Just a gentle ping on this one...
>>>
>>> Do you agree with the reasoning I suggested for filtering
>>> columns based on iopolicy? If we all have agreement then
>>> I'd send out the next patchset with appropriate change.
>>
>> I was waiting for Hannes input here as he was in discussion.
>>
> Yeah, and he is fine with the latest changes. Probably should've been
> a bit more vocal about it :-)
> 
>>>>> But really, I'm not sure if we should print out values from the various
>>>>> I/O policies. For NUMA it probably makes sense, but for round-robin and
>>>>> queue-depths the values are extremely volatile, so I wonder what benefit
>>>>> for the user is here.
>>>>>
>>>>
>>>> I think the qdepth output could still be useful. For example, if I/Os are
>>>> queuing up on one path (perhaps because that path is slower), then the Qdepth
>>>> value might help indicate something unusual or explain why one path is being
>>>> chosen over another.
>>>>
>>>> That said, if we all agree that tools or scripts should ideally rely on JSON
>>>> output for parsing, then the tabular output could be simplified further:
>>>>
>>>> - For numa iopolicy: print <Nodes> and exclude <Qdepth>.
>>>> - For queue-depth iopolicy: print <Qdepth> and exclude <Nodes>.
>>>> - For round-robin iopolicy: exclude both <Nodes> and <Qdepth>.
>>
>> Looks reasonable to me.
>>
> Yep, that's fine.
> 
>>>> Does this sound reasonable? Or do we still want to avoid printing
>>>> <Qdepth> even for queue-depth iopolicy?
>>
>> I am fine with printing the qdepth value as long it is documented what it
>> means. IIRC there are other tools which just show a snapshot for some
>> statistics.
>>
>> BTW, some discussion on github regarding something like a
>> 'monitor' feature: https://github.com/linux-nvme/nvme-cli/issues/2189
>> Might be something to which could be considered here as well.
>> Well, that might be tricky. The current 'tree' structure is build
> once when the program starts up. Any changes to that structure after
> that are not tracked, and there (currently) are no hooks for updating
> it. So having a 'monitor' function will get tricky.
> 
> To do that we would either need a udev event receiver (using uevents
> to update the tree structure) or looking into fanotify()/inotify().
> Then the tree structure would always be up-to-date and things like
> 'monitor' will be possible.
> Will be hell for the python bindings, mind :-(
> 
Thanks, that explanation helps. I see the difference — today nvme-cli
just builds the tree once and prints a snapshot, so volatile values like
Qdepth are inherently a “point in time” view. A monitor-style feature would
instead keep the tree live and update it as ANA state, path health, or qdepth
change (probably similar to how iostat or top refresh continuously). That
definitely makes sense for those kinds of fields, though as you say it would
require new hooks (udev/inotify) and rework of the in-memory tree handling.

For this patchset, I’d prefer to keep the scope limited and just filter the
tabular output based on iopolicy (NUMA -> Nodes, qdepth -> Qdepth, RR -> none).
That way the snapshot view stays useful. 

Having said that, the “monitor” idea seems worth pursuing separately as a
longer-term feature. For users who want a live view today, they can already
run something like "watch -n1 nvme show-topology ..." to refresh the snapshot
every second. A built-in --monitor feature would be nicer and more efficient,
but I think that’s orthogonal and can be discussed separately. And I believe
"minitor" feature shall be useful as well for implementing nvme-top command
(as I recall we (me and Daniel) discussed it during LSFM-2025). 

Thanks,
--Nilay