[PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs

Venkat venkat88 at linux.ibm.com
Sun Feb 22 04:36:38 PST 2026



> On 20 Feb 2026, at 11:18 PM, Nilay Shroff <nilay at linux.ibm.com> wrote:
> 
> Hi,
> 
> The NVMe driver encounters various events and conditions during normal
> operation that are either not tracked today or not exposed to userspace
> via sysfs. Lack of visibility into these events can make it difficult to
> diagnose subtle issues related to controller behavior, multipath
> stability, and I/O reliability.
> 
> This patchset adds several diagnostic counters that provide improved
> observability into NVMe behavior. These counters are intended to help
> users understand events such as transient path unavailability,
> controller retries/reconnect/reset, failovers, and I/O failures. They
> can also be consumed by monitoring tools such as nvme-top.
> 
> Specifically, this series proposes to export the following counters via
> sysfs:
>  - Command retry count
>  - Multipath failover count
>  - Command error count
>  - I/O requeue count
>  - I/O failure count
>  - Controller reset event counts
>  - Controller reconnect counts
> 
> The patchset consists of seven patches:
>  Patch 1: Export command retry count
>  Patch 2: Export multipath failover count
>  Patch 3: Export command error count
>  Patch 4: Export I/O requeue count
>  Patch 5: Export I/O failure count
>  Patch 6: Export controller reset event counts
>  Patch 7: Export controller reconnect event count
> 
> Please note that this patchset doesn't make any functional change but
> rather export relevant counters to user space via sysfs.
> 
> As usual, feedback/comments/suggestions are welcome!
> 
> Changes from v2:
>  - Allow user to write to sysfs attributes so that user could
>    reset stat counters, if needed (Sagi)
>  - The controller reconnect counter nr_reconnects could reset
>    to zero once connection is re-established, so instead of
>    exposing nr_reconnects counter via sysfs introduce a new
>    counter which accumulates the reconnect attempts and export 
>    this accumulated counter via sysfs (Sagi)
> Link to v2: https://lore.kernel.org/all/20260205124810.682559-1-nilay@linux.ibm.com/
> 
> Changes from v1:
>  - Remove export of stats for admin command rerty count (Keith)
>  - Use size_add() to ensure stat counters don't overflow (Keith)
> Link to v1: https://lore.kernel.org/all/20260130182028.885089-1-nilay@linux.ibm.com/  
> 
> Nilay Shroff (7):
>  nvme: export command retry count via sysfs
>  nvme: export multipath failover count via sysfs
>  nvme: export command error counters via sysfs
>  nvme: export I/O requeue count when no path is available via sysfs
>  nvme: export I/O failure count when no path is available via sysfs
>  nvme: export controller reset event count via sysfs
>  nvme: export controller reconnect event count via sysfs
> 
> drivers/nvme/host/core.c      |  18 +++-
> drivers/nvme/host/fc.c        |   5 +
> drivers/nvme/host/multipath.c |  89 ++++++++++++++++++
> drivers/nvme/host/nvme.h      |  13 ++-
> drivers/nvme/host/rdma.c      |   4 +
> drivers/nvme/host/sysfs.c     | 167 ++++++++++++++++++++++++++++++++++
> drivers/nvme/host/tcp.c       |   3 +
> 7 files changed, 297 insertions(+), 2 deletions(-)
> 
> -- 
> 2.52.0
> 
> 

Hello Nilay,

I tested this patch series and found couple of attributes are missing. 

Missing diag counters:

1. I/O requeue count
2. I/O failure count

Rest all diag counters are exposed via sysfs properly.

Controller-level counters observed:
  - reset_events
  - reconnect_events
  - command_error_count

Namespace-instance counters observed:
  - command_retries
  - multipath_failover_count
  - command_error_count


Logs:

ll /sys/class/nvme/nvme3/ 
total 0
-r--r--r-- 1 root root 65536 Feb 22 05:49 address
-r--r--r-- 1 root root 65536 Feb 22 05:58 cntlid
-r--r--r-- 1 root root 65536 Feb 22 05:49 cntrltype
-rw-r--r-- 1 root root 65536 Feb 22 06:10 command_error_count
-rw-r--r-- 1 root root 65536 Feb 22 05:58 ctrl_loss_tmo
-r--r--r-- 1 root root 65536 Feb 22 05:49 dctype
--w------- 1 root root 65536 Feb 22 05:58 delete_controller
-r--r--r-- 1 root root 65536 Feb 22 05:58 dev
lrwxrwxrwx 1 root root     0 Feb 22 05:50 device -> ../../ctl
-rw-r--r-- 1 root root 65536 Feb 22 05:58 fast_io_fail_tmo
-r--r--r-- 1 root root 65536 Feb 22 05:49 firmware_rev
-r--r--r-- 1 root root 65536 Feb 22 05:51 hostid
-r--r--r-- 1 root root 65536 Feb 22 05:51 hostnqn
-r--r--r-- 1 root root 65536 Feb 22 05:58 kato
-r--r--r-- 1 root root 65536 Feb 22 05:49 model
-r--r--r-- 1 root root 65536 Feb 22 05:49 numa_node
drwxr-xr-x 9 root root     0 Feb 22 05:49 nvme3c3n1
drwxr-xr-x 9 root root     0 Feb 22 05:49 nvme3c3n10
drwxr-xr-x 9 root root     0 Feb 22 05:49 nvme3c3n2
drwxr-xr-x 9 root root     0 Feb 22 05:49 nvme3c3n3
drwxr-xr-x 9 root root     0 Feb 22 05:49 nvme3c3n4
drwxr-xr-x 9 root root     0 Feb 22 05:49 nvme3c3n5
drwxr-xr-x 9 root root     0 Feb 22 05:49 nvme3c3n6
drwxr-xr-x 9 root root     0 Feb 22 05:49 nvme3c3n7
drwxr-xr-x 9 root root     0 Feb 22 05:49 nvme3c3n8
drwxr-xr-x 9 root root     0 Feb 22 05:49 nvme3c3n9
-rw-r--r-- 1 root root 65536 Feb 22 05:58 passthru_err_log_enabled
drwxr-xr-x 2 root root     0 Feb 22 05:58 power
-r--r--r-- 1 root root 65536 Feb 22 05:49 queue_count
-rw-r--r-- 1 root root 65536 Feb 22 05:58 reconnect_delay
-rw-r--r-- 1 root root 65536 Feb 22 06:11 reconnect_events
--w------- 1 root root 65536 Feb 22 05:58 rescan_controller
--w------- 1 root root 65536 Feb 22 06:11 reset_controller
-rw-r--r-- 1 root root 65536 Feb 22 06:10 reset_events
-r--r--r-- 1 root root 65536 Feb 22 05:49 serial
-r--r--r-- 1 root root 65536 Feb 22 05:49 sqsize
-r--r--r-- 1 root root 65536 Feb 22 05:49 state
-r--r--r-- 1 root root 65536 Feb 22 05:51 subsysnqn
lrwxrwxrwx 1 root root     0 Feb 22 05:49 subsystem -> ../../../../../class/nvme
-r--r--r-- 1 root root 65536 Feb 22 05:51 transport
-rw-r--r-- 1 root root 65536 Feb 22 05:49 uevent


ll /sys/class/nvme/nvme3/nvme3c3n8
total 0
-r--r--r--  1 root root 65536 Feb 22 06:02 alignment_offset
-r--r--r--  1 root root 65536 Feb 22 05:51 ana_grpid
-r--r--r--  1 root root 65536 Feb 22 05:51 ana_state
-r--r--r--  1 root root 65536 Feb 22 06:02 capability
-rw-r--r--  1 root root 65536 Feb 22 06:07 command_error_count
-rw-r--r--  1 root root 65536 Feb 22 06:07 command_retries
-r--r--r--  1 root root 65536 Feb 22 06:02 csi
lrwxrwxrwx  1 root root     0 Feb 22 05:50 device -> ../../nvme3
-r--r--r--  1 root root 65536 Feb 22 06:02 discard_alignment
-r--r--r--  1 root root 65536 Feb 22 06:02 diskseq
-r--r--r--  1 root root 65536 Feb 22 06:02 events
-r--r--r--  1 root root 65536 Feb 22 06:02 events_async
-rw-r--r--  1 root root 65536 Feb 22 06:02 events_poll_msecs
-r--r--r--  1 root root 65536 Feb 22 06:02 ext_range
-r--r--r--  1 root root 65536 Feb 22 06:02 hidden
drwxr-xr-x  2 root root     0 Feb 22 06:02 holders
-r--r--r--  1 root root 65536 Feb 22 06:02 inflight
drwxr-xr-x  2 root root     0 Feb 22 06:02 integrity
-r--r--r--  1 root root 65536 Feb 22 06:02 metadata_bytes
drwxr-xr-x 18 root root     0 Feb 22 06:02 mq
-rw-r--r--  1 root root 65536 Feb 22 06:07 multipath_failover_count
-r--r--r--  1 root root 65536 Feb 22 06:02 nguid
-r--r--r--  1 root root 65536 Feb 22 06:02 nsid
-r--r--r--  1 root root 65536 Feb 22 06:02 numa_nodes
-r--r--r--  1 root root 65536 Feb 22 06:02 nuse
-r--r--r--  1 root root 65536 Feb 22 06:02 partscan
-rw-r--r--  1 root root 65536 Feb 22 06:02 passthru_err_log_enabled
drwxr-xr-x  2 root root     0 Feb 22 06:02 power
drwxr-xr-x  2 root root     0 Feb 22 05:49 queue
-r--r--r--  1 root root 65536 Feb 22 06:02 queue_depth
-r--r--r--  1 root root 65536 Feb 22 06:02 range
-r--r--r--  1 root root 65536 Feb 22 05:49 removable
-r--r--r--  1 root root 65536 Feb 22 06:02 ro
-r--r--r--  1 root root 65536 Feb 22 05:50 size
drwxr-xr-x  2 root root     0 Feb 22 06:02 slaves
-r--r--r--  1 root root 65536 Feb 22 06:02 stat
lrwxrwxrwx  1 root root     0 Feb 22 05:49 subsystem -> ../../../../../../class/block
drwxr-xr-x  2 root root     0 Feb 22 06:02 trace
-rw-r--r--  1 root root 65536 Feb 22 05:49 uevent
-r--r--r--  1 root root 65536 Feb 22 06:02 uuid
-r--r--r--  1 root root 65536 Feb 22 06:02 wwid


Regards,
Venkat.




More information about the Linux-nvme mailing list