[PATCH 0/7] nvme-cli: add nvme top command for real-time monitoring
Sagi Grimberg
sagi at grimberg.me
Sun May 10 15:34:40 PDT 2026
On 30/04/2026 13:52, Nilay Shroff wrote:
> Hi,
>
> Monitoring NVMe devices and paths in production is currently limited to
> static snapshots via nvme-cli. While this is sufficient for basic
> inspection, it is not ideal for NVMe-oF (fabrics) deployments where path
> conditions can change dynamically due to varying network latency,
> congestion, or link failures.
>
> In multipath environments, administrators often need continuous
> visibility into path state, ANA status, queue depth, link speed, and
> error counters. Today, this typically requires repeatedly invoking
> commands or relying on ad-hoc tooling, making it harder to quickly
> identify issues.
>
> This patch series introduces "nvme top", a tool for real-time monitoring
> of NVMe devices and fabrics paths, similar in spirit to tools such as
> top or iotop. The goal is to provide a continuously updating view of
> device and path health, enabling faster detection of link degradation,
> multipath imbalances, and transient failures.
>
> The series first adds the necessary building blocks for supporting a
> top-like dashboard. The initial patches extend the table APIs (including
> support for additional data types such as unsigned, long, float, and
> double) and introduce a generic dashboard framework. The final patch
> adds the nvme top command built on top of this framework.
>
> Future work:
> - Export NVMe statistics to external monitoring systems (e.g. Grafana).
> - Improve topology change detection in multipath configurations. The
> current implementation relies on kobject uevents for topology change,
> but namespace path add/delete events are not exported by the kernel
> since they are associated with hidden gendisk kobjects. This may
> require explicit uevent generation from the NVMe driver for namespace
> path changes.
> - Wire nvme top into an MCP pipeline and feed it to an LLM
Nice, However I think that the traddr information is missing. Often the
network
has some routing issues for specific IP. This tool show this.
More information about the Linux-nvme
mailing list