[PATCH 0/7] nvme-cli: add nvme top command for real-time monitoring
Sagi Grimberg
sagi at grimberg.me
Mon May 11 05:54:48 PDT 2026
On 11/05/2026 14:59, Nilay Shroff wrote:
> On 5/11/26 4:04 AM, Sagi Grimberg wrote:
>>
>>
>> On 30/04/2026 13:52, Nilay Shroff wrote:
>>> Hi,
>>>
>>> Monitoring NVMe devices and paths in production is currently limited to
>>> static snapshots via nvme-cli. While this is sufficient for basic
>>> inspection, it is not ideal for NVMe-oF (fabrics) deployments where
>>> path
>>> conditions can change dynamically due to varying network latency,
>>> congestion, or link failures.
>>>
>>> In multipath environments, administrators often need continuous
>>> visibility into path state, ANA status, queue depth, link speed, and
>>> error counters. Today, this typically requires repeatedly invoking
>>> commands or relying on ad-hoc tooling, making it harder to quickly
>>> identify issues.
>>>
>>> This patch series introduces "nvme top", a tool for real-time
>>> monitoring
>>> of NVMe devices and fabrics paths, similar in spirit to tools such as
>>> top or iotop. The goal is to provide a continuously updating view of
>>> device and path health, enabling faster detection of link degradation,
>>> multipath imbalances, and transient failures.
>>>
>>> The series first adds the necessary building blocks for supporting a
>>> top-like dashboard. The initial patches extend the table APIs
>>> (including
>>> support for additional data types such as unsigned, long, float, and
>>> double) and introduce a generic dashboard framework. The final patch
>>> adds the nvme top command built on top of this framework.
>>>
>>> Future work:
>>> - Export NVMe statistics to external monitoring systems (e.g. Grafana).
>>> - Improve topology change detection in multipath configurations. The
>>> current implementation relies on kobject uevents for topology
>>> change,
>>> but namespace path add/delete events are not exported by the kernel
>>> since they are associated with hidden gendisk kobjects. This may
>>> require explicit uevent generation from the NVMe driver for
>>> namespace
>>> path changes.
>>> - Wire nvme top into an MCP pipeline and feed it to an LLM
>>
>> Nice, However I think that the traddr information is missing. Often
>> the network
>> has some routing issues for specific IP. This tool show this.
>
> This tool prints the traddr but NOT host_traddr. Did you mean we
> should print host_traddr?
> If yes, then I think that should be a fair ask.
I didn't see the traddr...
Also, ctrl+c is not existing which is annoying ;)
More information about the Linux-nvme
mailing list