[PATCH v8 7/8] nvme: sysfs: emit the marginal path state in show_state()

Keith Busch kbusch at kernel.org
Mon Jul 21 19:57:27 PDT 2025


On Wed, Jul 16, 2025 at 08:07:51AM +0200, Hannes Reinecke wrote:
> On 7/15/25 22:03, Keith Busch wrote:
> > 
> > What you're describing is a "path" state, not a controller state which
> > is why I'm suggesting the "ana_state" attribute since nothing else
> > represents the path fitness. If nvme can't describe this condition, then
> > maybe it should?
> > 
> We probably could, but that feels a bit cumbersome.
> Thing is, the FPIN LI (link integrity) message is just one a set of
> possible messages (congestion is another, but even more are defined).
> When adding a separate ANA state for that question would be raised
> how the other state would fit into that.
> From a conceptual side FPIN LI really is equivalent to a flaky
> path, which can happen at any time without any specific information
> anyway.
> Again making it questionable whether it should be specified in terms
> of ANA states.

I see. Re-reading ANA, it is more aligned to describing a controller as
active/passive or primary/secondary to the backing storage access rather
than the state of the host nexus, so I agree it's not well suited
for an ANA state. :(
 
> > Where does this 'FPIN LI' message originate from? The end point or
> > something inbetween? If it's the endpoint (or if both sides get the same
> > message?), then an ANA state to non-optimal should be possible, no? And
> > we already have the infrastructure to react to changing ANA states, so
> > you can transition to optimal if something gets repaired.
> 
> It's typically generated by the fabric/switch once it detects a link
> integrity problem on one of the links on a given path.
> 
> As mentioned above, it really is a attempt to codify the 'flaky path'
> scenario, where occasionaly errors are generated but I/O remains
> possible. So it really is an overlay over the ANA states, as _any_
> path might be affected.
> This discussion only centered around 'optimal' paths as our path
> selectors really only care about optimized paths; non-optimized
> paths are not considered here.
> Which might skew the view of this patchset somewhat.

Okay, but can we call it "degraded" instead of "marginal"? The latter
implies the poor quality is endemic to that path rather than a temporary
condition.



More information about the Linux-nvme mailing list