NVMe over Fabrics host: behavior on presence of ANA Group in "change" state

Alex Talker alextalker at yandex.ru
Fri Feb 11 01:21:36 PST 2022


 > [FK> ] I think I'm missing a bunch of context here. What is the
 > original question? I take a stab at some assumptions: What is an
 > empty ANA group? That is an ANA Group with NO NSIDs associated with
 > that group. Meaning the "Number of NSID Values" field is cleared to
 > '0h' in the ANA Group Descriptor. That descriptor can be used to
 > update some host internal state information related to that ANA
 > group, but it has no impact on any I/O because there can be no I/O
 > (since there are no NSID values). So I'm not sure where that is
 > going (because RGO=1 also can return ANA Groups that have state, but
 > no attached namespaces (it's a way to get group state without any
 > NSID inventory requirements)).

That's exactly right, "nnsids=0" case. I/O is not a problem for such a 
group, for sure.
I suppose the main argument we're having here is that when such a group 
has a "change" ANA state,
the host("nvme-core" module) starts a timer for ANATT which upon 
expiration resets the controller.
Now, I do not disagree that having such a group is "ugly" but rather 
argue that ANATT-related functionality could be only invoked for 
"nnsids>0" case,
since only then there's a relation between "change" state and a 
namespace via "ANAGRPID".

My approach for assigning ANA groups to namespaces involves and idea 
that on one node(i.e. "system") casually a namespace has the same state 
on every port,
since it's more likely that access state of the namespace would change, 
rather than what's it accessed through (the port),
so I simply pre-allocate 5 ANA groups per 5 possible at the moment ANA 
states on each port and then change "ANAGRPID" of a namespace to 
transition it from one state to another.
While it is perfectly possible as highlighter earlier to transition 
bypassing "change" state,
it is still preferable in my opinion in situations when the final state 
is not known "a priori",
and thus works as a graceful guard from host's I/O. This is why I opt to 
pre-allocate one for this state too,
however on modern versions of popular distributions that causes the 
reset issue described before,
which might have undetermined impact on my I/O in progress.

Thus, I find starting the ANATT timer redundant when "nnsids=0".
I think the only users such a change might affect if someone uses this 
as a dirty hack to reset controller on host(when that would be helpful 
tho?).
Otherwise, I have prepared & checked on the mainline a simple(+2 lines, 
-2 lines) patch that fixes this behavior,
so I might sent it if it's preferable to have this discussion around an 
actual change.

 > Now this treads into the TP 4108 space. There is currently no way to
 > report anything that impacts "only one namespace at a time". ANY
 > report of a change (AEN) for any namespace is always reporting a
 > state change for the entire group that contains the namespace where
 > the event occurred. That is the WHOLE POINT of ANA Groups. AND,
 > that is the whole point of TP4108 - to address that kind of situation
 > (where a change impacts only 1 namespace). Until TP4108 address this
 > situation, a single namespace changing the ANAGRPID is ugly. Maybe
 > we should get to work on that TP.

I ain't no member of a committee or something(unfortunately), so I have 
no idea what TP 4108 is about or where to find it.
But my main message on this passage was not in a sense how little data 
would be exchanged between target & hosts but rather for how many namespace
relation between them and associated with them ANA state would change, 
as to highlight the contrast between changing ANA state of a group and 
changing ANAGRPID of a namespace.
Again, I do not disagree that it's ugly but on the matter why I can't 
just go an assign each namespace(assuming NSID is global on my target 
system rather than one of the subsystems)
a separate ANA Group due to 8 times difference between allowed number of 
the first and the latter, I proposed to parametrize that in previous 
message but got no reply in that regard unfortunately.

Hope that more or less cleared things out.

Thanks for your time!

Best regards,
Alex




More information about the Linux-nvme mailing list