[PATCH v8 7/8] nvme: sysfs: emit the marginal path state in show_state()

Hannes Reinecke hare at suse.de
Mon Jul 21 23:41:55 PDT 2025


On 7/22/25 04:57, Keith Busch wrote:
> On Wed, Jul 16, 2025 at 08:07:51AM +0200, Hannes Reinecke wrote:
>> On 7/15/25 22:03, Keith Busch wrote:
>>>
>>> What you're describing is a "path" state, not a controller state which
>>> is why I'm suggesting the "ana_state" attribute since nothing else
>>> represents the path fitness. If nvme can't describe this condition, then
>>> maybe it should?
>>>
>> We probably could, but that feels a bit cumbersome.
>> Thing is, the FPIN LI (link integrity) message is just one a set of
>> possible messages (congestion is another, but even more are defined).
>> When adding a separate ANA state for that question would be raised
>> how the other state would fit into that.
>>  From a conceptual side FPIN LI really is equivalent to a flaky
>> path, which can happen at any time without any specific information
>> anyway.
>> Again making it questionable whether it should be specified in terms
>> of ANA states.
> 
> I see. Re-reading ANA, it is more aligned to describing a controller as
> active/passive or primary/secondary to the backing storage access rather
> than the state of the host nexus, so I agree it's not well suited
> for an ANA state. :(
>   
>>> Where does this 'FPIN LI' message originate from? The end point or
>>> something inbetween? If it's the endpoint (or if both sides get the same
>>> message?), then an ANA state to non-optimal should be possible, no? And
>>> we already have the infrastructure to react to changing ANA states, so
>>> you can transition to optimal if something gets repaired.
>>
>> It's typically generated by the fabric/switch once it detects a link
>> integrity problem on one of the links on a given path.
>>
>> As mentioned above, it really is a attempt to codify the 'flaky path'
>> scenario, where occasionaly errors are generated but I/O remains
>> possible. So it really is an overlay over the ANA states, as _any_
>> path might be affected.
>> This discussion only centered around 'optimal' paths as our path
>> selectors really only care about optimized paths; non-optimized
>> paths are not considered here.
>> Which might skew the view of this patchset somewhat.
> 
> Okay, but can we call it "degraded" instead of "marginal"? The latter
> implies the poor quality is endemic to that path rather than a temporary
> condition.

Sure we can.
(Although technically it _is_ endemic as it won't change without
user interaction. But I digress :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



More information about the Linux-nvme mailing list