[PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs
John Garry
john.g.garry at oracle.com
Mon Mar 9 08:32:10 PDT 2026
On 08/03/2026 18:55, Nilay Shroff wrote:
>> Last thought, as you are probably aware, John Garry is proposing to
>> lift the nvme multipath into a generic library, which suggests many of
>> these events would also need to be generic. Should some of these, like
>> error and retry counts, be appended to the generic disk stats instead?
>
Thanks for the mention
> Yes I am aware about libmultipath work.
> I agree that retry and error counters might conceptually fit into
> generic disk statistics. However the intent of these diagnostic counters
> is to capture all relevant events, including passthrough commands.
>
> Passthrough requests are typically not accounted for in generic disk
> statistics, which makes that interface unsuitable for these counters.
> Additionally some counters are reported at the controller level, and
> controllers do not have an associated gendisk or block device.
>
> For these reasons exporting them through the dedicated sysfs interfaces
> appears to be the most appropriate approach.
From the current list of proposed counters, my thoughts per counter are
WRT SCSI:
"nvme: export command retry count"
The ACRE which this is based on is not relevant (to SCSI), and I would
be reluctant to add such a counter for scsi_devices
"nvme: export multipath failover "
I think that this could be added for scsi_mpath_device class
"nvme: export command error counters "
Similar as "nvme: export command retry count"
"nvme: export I/O requeue count when no path is available "
I think that this could be added for scsi_mpath_device class
"nvme: export I/O failure"
Not really relevant to SCSI, or more relevant to SCSI low-level drivers
(which I would not want to expose as an ABI for SCSI multipath)
"nvme: export controller reset event count "
Same as "nvme: export I/O failure"
"nvme: export controller reconnect "
Again, same as "nvme: export I/O failure"
BTW, I think that the counters should be atomic - otherwise we are not
getting accurate results. And, as is mentioned, none seem to be in the
fastpath (so I don't know why not have them as atomic).
Finally, some of these counters seem to me to be more suitable for a
debugfs (and not sysfs).
Cheers!
More information about the Linux-nvme
mailing list