[PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs

Nilay Shroff nilay at linux.ibm.com
Fri Feb 20 09:48:45 PST 2026


Hi,

The NVMe driver encounters various events and conditions during normal
operation that are either not tracked today or not exposed to userspace
via sysfs. Lack of visibility into these events can make it difficult to
diagnose subtle issues related to controller behavior, multipath
stability, and I/O reliability.

This patchset adds several diagnostic counters that provide improved
observability into NVMe behavior. These counters are intended to help
users understand events such as transient path unavailability,
controller retries/reconnect/reset, failovers, and I/O failures. They
can also be consumed by monitoring tools such as nvme-top.

Specifically, this series proposes to export the following counters via
sysfs:
  - Command retry count
  - Multipath failover count
  - Command error count
  - I/O requeue count
  - I/O failure count
  - Controller reset event counts
  - Controller reconnect counts

The patchset consists of seven patches:
  Patch 1: Export command retry count
  Patch 2: Export multipath failover count
  Patch 3: Export command error count
  Patch 4: Export I/O requeue count
  Patch 5: Export I/O failure count
  Patch 6: Export controller reset event counts
  Patch 7: Export controller reconnect event count

Please note that this patchset doesn't make any functional change but
rather export relevant counters to user space via sysfs.

As usual, feedback/comments/suggestions are welcome!

Changes from v2:
  - Allow user to write to sysfs attributes so that user could
    reset stat counters, if needed (Sagi)
  - The controller reconnect counter nr_reconnects could reset
    to zero once connection is re-established, so instead of
    exposing nr_reconnects counter via sysfs introduce a new
    counter which accumulates the reconnect attempts and export 
    this accumulated counter via sysfs (Sagi)
Link to v2: https://lore.kernel.org/all/20260205124810.682559-1-nilay@linux.ibm.com/

Changes from v1:
  - Remove export of stats for admin command rerty count (Keith)
  - Use size_add() to ensure stat counters don't overflow (Keith)
Link to v1: https://lore.kernel.org/all/20260130182028.885089-1-nilay@linux.ibm.com/  

Nilay Shroff (7):
  nvme: export command retry count via sysfs
  nvme: export multipath failover count via sysfs
  nvme: export command error counters via sysfs
  nvme: export I/O requeue count when no path is available via sysfs
  nvme: export I/O failure count when no path is available via sysfs
  nvme: export controller reset event count via sysfs
  nvme: export controller reconnect event count via sysfs

 drivers/nvme/host/core.c      |  18 +++-
 drivers/nvme/host/fc.c        |   5 +
 drivers/nvme/host/multipath.c |  89 ++++++++++++++++++
 drivers/nvme/host/nvme.h      |  13 ++-
 drivers/nvme/host/rdma.c      |   4 +
 drivers/nvme/host/sysfs.c     | 167 ++++++++++++++++++++++++++++++++++
 drivers/nvme/host/tcp.c       |   3 +
 7 files changed, 297 insertions(+), 2 deletions(-)

-- 
2.52.0




More information about the Linux-nvme mailing list