[PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs

Sun Mar 8 11:55:49 PDT 2026

On 3/6/26 9:32 PM, Keith Busch wrote:
> On Wed, Mar 04, 2026 at 08:03:02PM +0530, Nilay Shroff wrote:
>> Hi Keith,
>>
>> A gentle ping on this. I´ve incorporated the review comments,
>> and the series has already received Reviewed-by and Tested-by tags.
>>
>> Could you please consider pulling it? Also, please let me know if
>> you have any further comments or if additional changes are needed.
> 
> Thanks, I was hoping free time would show up to let me look closer at
> this, but that doesn't appear to be happening, so looks like I have to
> force the time :)
> 
Thanks for your time and review!

> We always need to be a bit cautious on adding user visible attributes,
> so just throwing out some thoughts on this feature.
> 
> The event counters are not atomic. I know these are informational, so
> maybe it's not important to be 100% accurate in the reporting, but I
> don't know. Maybe you do want accuracy in which case using atomic_long_t
> might be the right type. I think that should be okay to use since none
> of these counters are in the normal fast path.

Currently the counters are implemented using size_t and updated through
the size_add() helper to avoid overflow (as you suggested earlier in the
review). Since these counters are not updated in performance critical
paths, switching them to atomic_long_t should be fine. I'll  update the
implementation to use atomic counters.

> 
> The names of the attributes are a bit inconsistent. Some have a "_count"
> suffix, some have "_events" suffix, and some have no suffix at all.
> 
Yes, I originally chose names that I felt were descriptive while trying
to keep them concise. However I agree that consistent naming is more
important. I'd update all attributes to use a uniform "_count" suffix.

> In order to keep sysfs a bit cleaner, should we consider moving these
> attributes under a sub-directory specifically for reporting event
> counters?
> 
The counters are currently exported from different objects depending on
what they represent. Some are per-namespace path, some per-controller,
and others are associated with the namespace head. Because of this
separation, placing all counters under a single sub-directory would
not be feasible.

If the concern is reducing clutter in existing directories, we could
instead introduce a dedicated sub-directory under each object and export
the counters there. For example, we could create a "diag" directory under
the per-ns path, controller, and ns-head directories and then export the
respective counters under "diag". What do you suggest?

> Last thought, as you are probably aware, John Garry is proposing to
> lift the nvme multipath into a generic library, which suggests many of
> these events would also need to be generic. Should some of these, like
> error and retry counts, be appended to the generic disk stats instead?

Yes I am aware about libmultipath work.
I agree that retry and error counters might conceptually fit into
generic disk statistics. However the intent of these diagnostic counters
is to capture all relevant events, including passthrough commands.

Passthrough requests are typically not accounted for in generic disk
statistics, which makes that interface unsuitable for these counters.
Additionally some counters are reported at the controller level, and
controllers do not have an associated gendisk or block device.

For these reasons exporting them through the dedicated sysfs interfaces
appears to be the most appropriate approach.

Thanks,
--Nilay