NVMe over Fabrics host: behavior on presence of ANA Group in "change" state
Alex Talker
alextalker at yandex.ru
Fri Feb 11 01:21:36 PST 2022
> [FK> ] I think I'm missing a bunch of context here. What is the
> original question? I take a stab at some assumptions: What is an
> empty ANA group? That is an ANA Group with NO NSIDs associated with
> that group. Meaning the "Number of NSID Values" field is cleared to
> '0h' in the ANA Group Descriptor. That descriptor can be used to
> update some host internal state information related to that ANA
> group, but it has no impact on any I/O because there can be no I/O
> (since there are no NSID values). So I'm not sure where that is
> going (because RGO=1 also can return ANA Groups that have state, but
> no attached namespaces (it's a way to get group state without any
> NSID inventory requirements)).
That's exactly right, "nnsids=0" case. I/O is not a problem for such a
group, for sure.
I suppose the main argument we're having here is that when such a group
has a "change" ANA state,
the host("nvme-core" module) starts a timer for ANATT which upon
expiration resets the controller.
Now, I do not disagree that having such a group is "ugly" but rather
argue that ANATT-related functionality could be only invoked for
"nnsids>0" case,
since only then there's a relation between "change" state and a
namespace via "ANAGRPID".
My approach for assigning ANA groups to namespaces involves and idea
that on one node(i.e. "system") casually a namespace has the same state
on every port,
since it's more likely that access state of the namespace would change,
rather than what's it accessed through (the port),
so I simply pre-allocate 5 ANA groups per 5 possible at the moment ANA
states on each port and then change "ANAGRPID" of a namespace to
transition it from one state to another.
While it is perfectly possible as highlighter earlier to transition
bypassing "change" state,
it is still preferable in my opinion in situations when the final state
is not known "a priori",
and thus works as a graceful guard from host's I/O. This is why I opt to
pre-allocate one for this state too,
however on modern versions of popular distributions that causes the
reset issue described before,
which might have undetermined impact on my I/O in progress.
Thus, I find starting the ANATT timer redundant when "nnsids=0".
I think the only users such a change might affect if someone uses this
as a dirty hack to reset controller on host(when that would be helpful
tho?).
Otherwise, I have prepared & checked on the mainline a simple(+2 lines,
-2 lines) patch that fixes this behavior,
so I might sent it if it's preferable to have this discussion around an
actual change.
> Now this treads into the TP 4108 space. There is currently no way to
> report anything that impacts "only one namespace at a time". ANY
> report of a change (AEN) for any namespace is always reporting a
> state change for the entire group that contains the namespace where
> the event occurred. That is the WHOLE POINT of ANA Groups. AND,
> that is the whole point of TP4108 - to address that kind of situation
> (where a change impacts only 1 namespace). Until TP4108 address this
> situation, a single namespace changing the ANAGRPID is ugly. Maybe
> we should get to work on that TP.
I ain't no member of a committee or something(unfortunately), so I have
no idea what TP 4108 is about or where to find it.
But my main message on this passage was not in a sense how little data
would be exchanged between target & hosts but rather for how many namespace
relation between them and associated with them ANA state would change,
as to highlight the contrast between changing ANA state of a group and
changing ANAGRPID of a namespace.
Again, I do not disagree that it's ugly but on the matter why I can't
just go an assign each namespace(assuming NSID is global on my target
system rather than one of the subsystems)
a separate ANA Group due to 8 times difference between allowed number of
the first and the latter, I proposed to parametrize that in previous
message but got no reply in that regard unfortunately.
Hope that more or less cleared things out.
Thanks for your time!
Best regards,
Alex
More information about the Linux-nvme
mailing list