[PATCH v8 7/8] nvme: sysfs: emit the marginal path state in show_state()

Hannes Reinecke hare at suse.de
Tue Jul 15 23:07:51 PDT 2025


On 7/15/25 22:03, Keith Busch wrote:
> On Tue, Jul 15, 2025 at 03:42:32PM -0400, John Meneghini wrote:
>> On 7/9/25 6:12 PM, Keith Busch wrote:
>>> On Wed, Jul 09, 2025 at 05:19:18PM -0400, Bryan Gurney wrote:
>>>> If a controller has received a link integrity or congestion event, and
>>>> has the NVME_CTRL_MARGINAL flag set, emit "marginal" in the state
>>>> instead of "live", to identify the marginal paths.
>>>
>>> IMO, this attribute looks more aligned to report in the ana_state
>>> instead of overriding the controller's state.
>>>
>>
>> We can't really do this because the ANA state is a documented protocol state.
>>
>> The linux controller state is purely a linux software defined state.  Unless
 >> I am wrong, there is nothing in the NVMe specification which defines
 >> the nvme_ctrl_state.>
> Totally correct.
>   
>> This is purely a linux definition and we should be able to change is any way we want.
> 
> My kneejerk reaction is against adding new controller states. We have
> state checks sprinkled about, and special states just make that more
> fragile.
>   
Yeah, controller states are not a good fit. We've seen the issues when
trying to introduce a new state for firmware update.

>> We debated adding a new NVME_CTRL_MARGINAL state to this data structure,
>>
>> enum nvme_ctrl_state {
>>          NVME_CTRL_NEW,
>>          NVME_CTRL_LIVE,
>>          NVME_CTRL_RESETTING,
>>          NVME_CTRL_CONNECTING,
>>          NVME_CTRL_DELETING,
>>          NVME_CTRL_DELETING_NOIO,
>>          NVME_CTRL_DEAD,
>> };
>>
>> If you don't like the flag we can do that. However, that doesn't seem worth the effort since Hannes has this working now with a flag.
> 
> What you're describing is a "path" state, not a controller state which
> is why I'm suggesting the "ana_state" attribute since nothing else
> represents the path fitness. If nvme can't describe this condition, then
> maybe it should?
> 
We probably could, but that feels a bit cumbersome.
Thing is, the FPIN LI (link integrity) message is just one a set of
possible messages (congestion is another, but even more are defined).
When adding a separate ANA state for that question would be raised
how the other state would fit into that.
 From a conceptual side FPIN LI really is equivalent to a flaky
path, which can happen at any time without any specific information
anyway.
Again making it questionable whether it should be specified in terms
of ANA states.

> Where does this 'FPIN LI' message originate from? The end point or
> something inbetween? If it's the endpoint (or if both sides get the same
> message?), then an ANA state to non-optimal should be possible, no? And
> we already have the infrastructure to react to changing ANA states, so
> you can transition to optimal if something gets repaired.

It's typically generated by the fabric/switch once it detects a link
integrity problem on one of the links on a given path.

As mentioned above, it really is a attempt to codify the 'flaky path'
scenario, where occasionaly errors are generated but I/O remains
possible. So it really is an overlay over the ANA states, as _any_
path might be affected.
This discussion only centered around 'optimal' paths as our path
selectors really only care about optimized paths; non-optimized
paths are not considered here.
Which might skew the view of this patchset somewhat.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



More information about the Linux-nvme mailing list