[PATCH 0/7] nvme-cli: add nvme top command for real-time monitoring
Nilay Shroff
nilay at linux.ibm.com
Mon May 11 04:59:01 PDT 2026
On 5/11/26 4:04 AM, Sagi Grimberg wrote:
>
>
> On 30/04/2026 13:52, Nilay Shroff wrote:
>> Hi,
>>
>> Monitoring NVMe devices and paths in production is currently limited to
>> static snapshots via nvme-cli. While this is sufficient for basic
>> inspection, it is not ideal for NVMe-oF (fabrics) deployments where path
>> conditions can change dynamically due to varying network latency,
>> congestion, or link failures.
>>
>> In multipath environments, administrators often need continuous
>> visibility into path state, ANA status, queue depth, link speed, and
>> error counters. Today, this typically requires repeatedly invoking
>> commands or relying on ad-hoc tooling, making it harder to quickly
>> identify issues.
>>
>> This patch series introduces "nvme top", a tool for real-time monitoring
>> of NVMe devices and fabrics paths, similar in spirit to tools such as
>> top or iotop. The goal is to provide a continuously updating view of
>> device and path health, enabling faster detection of link degradation,
>> multipath imbalances, and transient failures.
>>
>> The series first adds the necessary building blocks for supporting a
>> top-like dashboard. The initial patches extend the table APIs (including
>> support for additional data types such as unsigned, long, float, and
>> double) and introduce a generic dashboard framework. The final patch
>> adds the nvme top command built on top of this framework.
>>
>> Future work:
>> - Export NVMe statistics to external monitoring systems (e.g. Grafana).
>> - Improve topology change detection in multipath configurations. The
>> current implementation relies on kobject uevents for topology change,
>> but namespace path add/delete events are not exported by the kernel
>> since they are associated with hidden gendisk kobjects. This may
>> require explicit uevent generation from the NVMe driver for namespace
>> path changes.
>> - Wire nvme top into an MCP pipeline and feed it to an LLM
>
> Nice, However I think that the traddr information is missing. Often the network
> has some routing issues for specific IP. This tool show this.
This tool prints the traddr but NOT host_traddr. Did you mean we should print host_traddr?
If yes, then I think that should be a fair ask.
Thanks,
--Nilay
More information about the Linux-nvme
mailing list