[PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs
Nilay Shroff
nilay at linux.ibm.com
Wed Mar 4 06:33:02 PST 2026
Hi Keith,
A gentle ping on this. I’ve incorporated the review comments,
and the series has already received Reviewed-by and Tested-by tags.
Could you please consider pulling it? Also, please let me know if
you have any further comments or if additional changes are needed.
Thanks,
--Nilay
On 2/20/26 11:18 PM, Nilay Shroff wrote:
> Hi,
>
> The NVMe driver encounters various events and conditions during normal
> operation that are either not tracked today or not exposed to userspace
> via sysfs. Lack of visibility into these events can make it difficult to
> diagnose subtle issues related to controller behavior, multipath
> stability, and I/O reliability.
>
> This patchset adds several diagnostic counters that provide improved
> observability into NVMe behavior. These counters are intended to help
> users understand events such as transient path unavailability,
> controller retries/reconnect/reset, failovers, and I/O failures. They
> can also be consumed by monitoring tools such as nvme-top.
>
> Specifically, this series proposes to export the following counters via
> sysfs:
> - Command retry count
> - Multipath failover count
> - Command error count
> - I/O requeue count
> - I/O failure count
> - Controller reset event counts
> - Controller reconnect counts
>
> The patchset consists of seven patches:
> Patch 1: Export command retry count
> Patch 2: Export multipath failover count
> Patch 3: Export command error count
> Patch 4: Export I/O requeue count
> Patch 5: Export I/O failure count
> Patch 6: Export controller reset event counts
> Patch 7: Export controller reconnect event count
>
> Please note that this patchset doesn't make any functional change but
> rather export relevant counters to user space via sysfs.
>
> As usual, feedback/comments/suggestions are welcome!
>
> Changes from v2:
> - Allow user to write to sysfs attributes so that user could
> reset stat counters, if needed (Sagi)
> - The controller reconnect counter nr_reconnects could reset
> to zero once connection is re-established, so instead of
> exposing nr_reconnects counter via sysfs introduce a new
> counter which accumulates the reconnect attempts and export
> this accumulated counter via sysfs (Sagi)
> Link to v2: https://lore.kernel.org/all/20260205124810.682559-1-nilay@linux.ibm.com/
>
> Changes from v1:
> - Remove export of stats for admin command rerty count (Keith)
> - Use size_add() to ensure stat counters don't overflow (Keith)
> Link to v1: https://lore.kernel.org/all/20260130182028.885089-1-nilay@linux.ibm.com/
>
> Nilay Shroff (7):
> nvme: export command retry count via sysfs
> nvme: export multipath failover count via sysfs
> nvme: export command error counters via sysfs
> nvme: export I/O requeue count when no path is available via sysfs
> nvme: export I/O failure count when no path is available via sysfs
> nvme: export controller reset event count via sysfs
> nvme: export controller reconnect event count via sysfs
>
> drivers/nvme/host/core.c | 18 +++-
> drivers/nvme/host/fc.c | 5 +
> drivers/nvme/host/multipath.c | 89 ++++++++++++++++++
> drivers/nvme/host/nvme.h | 13 ++-
> drivers/nvme/host/rdma.c | 4 +
> drivers/nvme/host/sysfs.c | 167 ++++++++++++++++++++++++++++++++++
> drivers/nvme/host/tcp.c | 3 +
> 7 files changed, 297 insertions(+), 2 deletions(-)
>
More information about the Linux-nvme
mailing list