[PATCH 0/7] nvme-cli: add nvme top command for real-time monitoring
Nilay Shroff
nilay at linux.ibm.com
Mon May 11 07:04:07 PDT 2026
On 5/11/26 6:24 PM, Sagi Grimberg wrote:
>
>
> On 11/05/2026 14:59, Nilay Shroff wrote:
>> On 5/11/26 4:04 AM, Sagi Grimberg wrote:
>>>
>>>
>>> On 30/04/2026 13:52, Nilay Shroff wrote:
>>>> Hi,
>>>>
>>>> Monitoring NVMe devices and paths in production is currently limited to
>>>> static snapshots via nvme-cli. While this is sufficient for basic
>>>> inspection, it is not ideal for NVMe-oF (fabrics) deployments where path
>>>> conditions can change dynamically due to varying network latency,
>>>> congestion, or link failures.
>>>>
>>>> In multipath environments, administrators often need continuous
>>>> visibility into path state, ANA status, queue depth, link speed, and
>>>> error counters. Today, this typically requires repeatedly invoking
>>>> commands or relying on ad-hoc tooling, making it harder to quickly
>>>> identify issues.
>>>>
>>>> This patch series introduces "nvme top", a tool for real-time monitoring
>>>> of NVMe devices and fabrics paths, similar in spirit to tools such as
>>>> top or iotop. The goal is to provide a continuously updating view of
>>>> device and path health, enabling faster detection of link degradation,
>>>> multipath imbalances, and transient failures.
>>>>
>>>> The series first adds the necessary building blocks for supporting a
>>>> top-like dashboard. The initial patches extend the table APIs (including
>>>> support for additional data types such as unsigned, long, float, and
>>>> double) and introduce a generic dashboard framework. The final patch
>>>> adds the nvme top command built on top of this framework.
>>>>
>>>> Future work:
>>>> - Export NVMe statistics to external monitoring systems (e.g. Grafana).
>>>> - Improve topology change detection in multipath configurations. The
>>>> current implementation relies on kobject uevents for topology change,
>>>> but namespace path add/delete events are not exported by the kernel
>>>> since they are associated with hidden gendisk kobjects. This may
>>>> require explicit uevent generation from the NVMe driver for namespace
>>>> path changes.
>>>> - Wire nvme top into an MCP pipeline and feed it to an LLM
>>>
>>> Nice, However I think that the traddr information is missing. Often the network
>>> has some routing issues for specific IP. This tool show this.
>>
>> This tool prints the traddr but NOT host_traddr. Did you mean we should print host_traddr?
>> If yes, then I think that should be a fair ask.
>
> I didn't see the traddr...
>
> Also, ctrl+c is not existing which is annoying ;)
Yes that will be fixed in next version.
Thanks,
--Nilay
More information about the Linux-nvme
mailing list