[PATCHv2 2/4] nvme: extend show-topology command to add support for multipath

Nilay Shroff nilay at linux.ibm.com
Tue Aug 19 03:31:10 PDT 2025



On 8/19/25 11:45 AM, Hannes Reinecke wrote:
> On 8/19/25 06:49, Nilay Shroff wrote:
>>
>>
>> On 8/18/25 12:52 PM, Hannes Reinecke wrote:
>>> On 8/12/25 14:56, Nilay Shroff wrote:
>>>> This commit enhances the show-topology command by adding support for
>>>> NVMe multipath. With this change, users can now list all paths to a
>>>> namespace from its corresponding head node device. Each NVMe path
>>>> entry then also includes additional details such as ANA state, NUMA
>>>> node, and queue depth, improving visibility into multipath configs.
>>>> This information can be particularly helpful for debugging and
>>>> analyzing NVMe multipath setups.
>>>>
>>>> To support this functionality, the "--ranking" option of the nvme
>>>> show-topology command has been extended with a new sub-option:
>>>> "multipath".
>>>>
>>>> Since this enhancement is specific to NVMe multipath, the iopolicy
>>>> configured under each subsystem is now always displayed. Previously,
>>>> iopolicy was shown only with nvme show-topology verbose output, but
>>>> it is now included by default to improve usability and provide better
>>>> context when reviewing multipath configurations via show-topology.
>>>>
>>>> With this update, users can view the multipath topology of a multi
>>>> controller/port NVMe disk using:
>>>>
>>>> $ nvme show-topology -r multipath
>>>>
>>>> nvme-subsys2 - NQN=nvmet_subsystem
>>>>                  hostnqn=nqn.2014-08.org.nvmexpress:uuid:12b49f6e-0276-4746-b10c-56815b7e6dc2
>>>>                  iopolicy=numa
>>>>
>>>>             _ _ _<head-node>
>>>>            /              _ _ _ <ana-state>
>>>>           /              /      _ _ _ <numa-node-list>
>>>>          /              /      /  _ _ _<queue-depth>
>>>>         |              /      /  /
>>>>    +- nvme2n1 (ns 1)  /      /  /
>>>>    \                 |      |  |
>>>>     +- nvme2c2n1 optimized 1,2 0 nvme2 tcp traddr=127.0.0.2,trsvcid=4460,src_addr=127.0.0.1 live
>>>>     +- nvme2c3n1 optimized 3,4 0 nvme3 tcp traddr=127.0.0.3,trsvcid=4460,src_addr=127.0.0.1 live
>>>>
>>>> Please note that the annotations shown above (e.g., <numa-node-list>,
>>>> <ana-state>, <hed-node>, and <queue-depth>) are included for clarity
>>>> only and are not part of the actual output.
>>>>
>>>
>>> Hmm. Why do we have the values for 'numa-node-list' and 'queue-depth'
>>> both in here? They are tied to the selected IO policy, and pretty
>>> meaningless if that IO policy is not selected.
>>> Please include only the values relevant for the selected IO policy;
>>> this will increase readability of the resulting status string.
>>>
>> Okay makes sense, so we'd print <numa-node> and exclude <queue-depth> if iopolicy
>> is numa. For 'queue-depth' iopolicy, we'd print <queue-depth> and exclude <numa-node>.
>> And for 'round-robin' iopolicy, we'd neither print <numa-node> nor <queue-depth>.
>> I'll update this in the next patch.
>>
> Hmm. I'd rather have _some_ value for 'round-robin', too, as otherwise
> the number of fields will be different (and making parsing harder).
> 
Okay so then how about printing <numa-node> for round-robin policy as well?

I looked at the NVMe path selection code for the round-robin iopolicy, and it
appears the kernel uses the NUMA node ID of the I/O submitting CPU as the
reference for path selection.
For example, on a system with two NUMA nodes (0 and 1) and two NVMe paths
(PA and PB):
- If an I/O from node 0 selects PA, that choice is cached.
- Next time when kernel receives the I/O from the node 0, it'd retrieve
  the cached path value and find the last path chosen was PA. So now it
  will choose next available path which is PB to forward this IO.

This way kernel alternates between PA and PB in round-robin fashion.
So the selection is still tied to the submitting NUMA node, just with path
rotation layered on top. Given that, I think it makes sense to also print
<numa-node> for round-robin iopolicy, to keep field consistency and still
provide meaningful context. Agreed?

Thanks,
--Nilay



More information about the Linux-nvme mailing list