[PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs

Thu Mar 19 08:55:59 PDT 2026

Hi John,

Sorry I missed to reply earlier as I see you sent your reply (probably by mistake)
only to the list; so I just saw your message. I am adding back all recipients.

On 3/9/26 9:02 PM, John Garry wrote:
> On 08/03/2026 18:55, Nilay Shroff wrote:
>>> Last thought, as you are probably aware, John Garry is proposing to
>>> lift the nvme multipath into a generic library, which suggests many of
>>> these events would also need to be generic. Should some of these, like
>>> error and retry counts, be appended to the generic disk stats instead?
>>
> 
> Thanks for the mention
> 
>> Yes I am aware about libmultipath work.
>> I agree that retry and error counters might conceptually fit into
>> generic disk statistics. However the intent of these diagnostic counters
>> is to capture all relevant events, including passthrough commands.
>>
>> Passthrough requests are typically not accounted for in generic disk
>> statistics, which makes that interface unsuitable for these counters.
>> Additionally some counters are reported at the controller level, and
>> controllers do not have an associated gendisk or block device.
>>
>> For these reasons exporting them through the dedicated sysfs interfaces
>> appears to be the most appropriate approach.
> 
>  From the current list of proposed counters, my thoughts per counter are WRT SCSI:
> "nvme: export command retry count"
> The ACRE which this is based on is not relevant (to SCSI), and I would be reluctant to add such a counter for scsi_devices
> 
> "nvme: export multipath failover "
> I think that this could be added for scsi_mpath_device class
Ack
> 
> "nvme: export command error counters "
> Similar as "nvme: export command retry count"
> 
> "nvme: export I/O requeue count when no path is available "
> I think that this could be added for scsi_mpath_device class
I think this one should be added under mpath_head; this counter
represents the num of I/Os which has to re-queued (i.e.
mpath_head->requeue_list) due to none of the path is currently
available

> 
> "nvme: export I/O failure"
> Not really relevant to SCSI, or more relevant to SCSI low-level drivers (which I would not want to expose as an ABI for SCSI multipath)
> 
I think this one as well could be added under mpath_head; this
counter represents num of I/Os which are forced to fail maybe
because all paths (reachable via head node) were either deleted
or not usable at all.

> "nvme: export controller reset event count "
> Same as "nvme: export I/O failure"
> 
> "nvme: export controller reconnect "
> Again, same as "nvme: export I/O failure"
> 
> BTW, I think that the counters should be atomic - otherwise we are not getting accurate results. And, as is mentioned, none seem to be in the fastpath (so I don't know why not have them as atomic).
> 
Yes I will be making counters atomic as Keith suggested earlier.

> Finally, some of these counters seem to me to be more suitable for a debugfs (and not sysfs).
> 
You are correct but then this counters would be consumed by nvme-cli
(and mostly by nvme-top) and you know the debugfs may not be always
available or mounted in production system. For that reason, exporting
the metric through sysfs ensures it is consistently accessible in
production environments.

Thanks,
--Nilay