[PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs

Nilay Shroff nilay at linux.ibm.com
Wed Mar 4 06:33:02 PST 2026


Hi Keith,

A gentle ping on this. I’ve incorporated the review comments,
and the series has already received Reviewed-by and Tested-by tags.

Could you please consider pulling it? Also, please let me know if
you have any further comments or if additional changes are needed.

Thanks,
--Nilay

On 2/20/26 11:18 PM, Nilay Shroff wrote:
> Hi,
> 
> The NVMe driver encounters various events and conditions during normal
> operation that are either not tracked today or not exposed to userspace
> via sysfs. Lack of visibility into these events can make it difficult to
> diagnose subtle issues related to controller behavior, multipath
> stability, and I/O reliability.
> 
> This patchset adds several diagnostic counters that provide improved
> observability into NVMe behavior. These counters are intended to help
> users understand events such as transient path unavailability,
> controller retries/reconnect/reset, failovers, and I/O failures. They
> can also be consumed by monitoring tools such as nvme-top.
> 
> Specifically, this series proposes to export the following counters via
> sysfs:
>    - Command retry count
>    - Multipath failover count
>    - Command error count
>    - I/O requeue count
>    - I/O failure count
>    - Controller reset event counts
>    - Controller reconnect counts
> 
> The patchset consists of seven patches:
>    Patch 1: Export command retry count
>    Patch 2: Export multipath failover count
>    Patch 3: Export command error count
>    Patch 4: Export I/O requeue count
>    Patch 5: Export I/O failure count
>    Patch 6: Export controller reset event counts
>    Patch 7: Export controller reconnect event count
> 
> Please note that this patchset doesn't make any functional change but
> rather export relevant counters to user space via sysfs.
> 
> As usual, feedback/comments/suggestions are welcome!
> 
> Changes from v2:
>    - Allow user to write to sysfs attributes so that user could
>      reset stat counters, if needed (Sagi)
>    - The controller reconnect counter nr_reconnects could reset
>      to zero once connection is re-established, so instead of
>      exposing nr_reconnects counter via sysfs introduce a new
>      counter which accumulates the reconnect attempts and export
>      this accumulated counter via sysfs (Sagi)
> Link to v2: https://lore.kernel.org/all/20260205124810.682559-1-nilay@linux.ibm.com/
> 
> Changes from v1:
>    - Remove export of stats for admin command rerty count (Keith)
>    - Use size_add() to ensure stat counters don't overflow (Keith)
> Link to v1: https://lore.kernel.org/all/20260130182028.885089-1-nilay@linux.ibm.com/
> 
> Nilay Shroff (7):
>    nvme: export command retry count via sysfs
>    nvme: export multipath failover count via sysfs
>    nvme: export command error counters via sysfs
>    nvme: export I/O requeue count when no path is available via sysfs
>    nvme: export I/O failure count when no path is available via sysfs
>    nvme: export controller reset event count via sysfs
>    nvme: export controller reconnect event count via sysfs
> 
>   drivers/nvme/host/core.c      |  18 +++-
>   drivers/nvme/host/fc.c        |   5 +
>   drivers/nvme/host/multipath.c |  89 ++++++++++++++++++
>   drivers/nvme/host/nvme.h      |  13 ++-
>   drivers/nvme/host/rdma.c      |   4 +
>   drivers/nvme/host/sysfs.c     | 167 ++++++++++++++++++++++++++++++++++
>   drivers/nvme/host/tcp.c       |   3 +
>   7 files changed, 297 insertions(+), 2 deletions(-)
> 




More information about the Linux-nvme mailing list