NVMe over Fabrics host: behavior on presence of ANA Group in "change" state

Sagi Grimberg sagi at grimberg.me
Mon Feb 7 14:16:02 PST 2022


Alex,

Can you please stop top-posting, its difficult to follow this
copy-pasting-top-posting chain that you are generating..

>  > I'm not exactly sure what you are trying to do, but it sounds
>  > wrong... ANA groups are supposed to be a logical unit that expresses
>  > controllers access state to the associated namespaces that belong to
>  > the group.
> 
> I do agree that my setup might seem odd but I doubt it contradicts your 
> statement much,
> since each group would represent state of namespaces belonging to it,
> the difference is just that instate of having a complex(or should I say 
> one depending on installation/deployment)
> relationship between a namespace and an ANA group, I opted for the 
> balancing act between flexibility of assigning state for a namespace
> and having a constant set of ANA groups on each system.
> In my view, it is rather often situation when one namespace has troubles 
> while others aren't and thus it better be unavailable on all ports at once,
> rather than when certain port needs to deny access to certain namespaces 
> for, say, maintenance issues.

Not exactly sure what you are alluding to. I didn't suggest anything
about any static configuration. I was just explaining what ANA groups
are expressing. They're there, you are free to use them however you
like.

>  > That is an abuse of ANA groups IMO. But OK...
> 
> I do not disagree but so seems to do the standard.

The standard abuses ANA groups?

> But let me try to explain my perspective in possibly more familiar 
> analogy to you.
> As you probably aware, with ALUA in SCSI, via Target Port Groups 
> mechanism, one can with zero worry specify certain LUN (ALUA) state on a 
> set of targets(at least in SCST implementation).
> I ain't sure about certain limitations but I think it's quite easy to 
> keep up with 1 LUN = 1 group ratio for flexible control.
> However, as I highlighter in earlier message, in nvmet implementation 
> there's allowed only 128 ANA Groups, while (each!) subsystem may keep up 
> to 1024 namespaces.

If your use-case needs more than 128 groups, you can send a patch.

> Thus, if I had no issue of say assigning a group per each 
> namespace(assuming that NSIDs are globally unique on my target), this is 
> currently not the case,
> so I'm trying my best of out in these restrictions, while keeping ANA 
> Group setup as straightforward, as possible.
> One may argue that I shall dump everything into one ANA Group but it 
> will contradict my expectations of High Availability of namespaces that 
> are still (mostly?) working while others aren't.

I don't understand where HA/multipathing come into play here at all, let
alone asymmetric access. But it doesn't really matter.

> One also may argue that it's rare to have in production greater number 
> of namespaces than 128 in total but I still would prefer to go for 
> support of 1024 anyway.
> Hope I cleared that one out, do feel free to correct me if I have a flaw 
> somewhere.
> 
>  > This state is not a permanent state, it is transient by definition,
>  > which is why the host is treating it as such.
>  >
>  > The host is expecting the controller to send another ANA AEN that
>  > notifies the new state within ANATT (i.e. stateA -> change ->
>  > stateB).
> 
> As mentioned by Hannes, and I agree, state is indeed transient but only 
> in relation to a namespace,

Where did you get that from the spec? How can a state be transient or
persistent if the ana group has zero namespaces or not? It is completely 
orthogonal. Any relashionship between the ana group state lifetime and
the number of namespaces that belong to the group make no sense to me at
all tbh.

> so I find it to be zero issue of having a group in change state with 0 
> namespaces as its members.

What is "zero issue"? you mean a non-issue?
The spec defines this state as a state that represents transition
between states, and hence its not surprising that the host expects it to
be as such. IMO the current host behavior is correct.

> I understand that it would be nice and dandy to change state of multiple 
> namespaces at once(if one can take time to configure such dependency 
> between them),
> but I at the moment opt for simpler but flexible solution, maybe at the 
> cost of greater number of ANA log changes in worst-case scenario.
> Thus, the cycle "namespace in state A" => "namespace in state of change" 
> => "namespace in state B" is still preserved, tho with different 
> methods(change of a group rather than a state of the group).

nvmet is actually violating the spec right now because it doesn't set
bit 6 in ctrl identify anacap, and it clearly exposes anagrpid as a
config knob, so either we need to block it, or set that bit.

In any event, you can move namespaces between ana groups as much as
you like, you don't need the change state at all, just don't use it,
especially if you keep it permanently which is not what the host nor the
spec expects.

>  > That is simply removing support for multipathing altogether.
> 
> You're not wrong on that one, tho, no offense,

None taken :)

> in certain configurations or certain initiators that's a way to go.
> Especially when it might be a matter of changing one implementation to 
> another(i.e. old good dm-multipath).

Whatever works for you...

> I mainly mentioned this because it fixes the issue on some 
> kernels(including mainline/LTS) while not on others,
> which is why I think it's important that misinterpretation of the 
> standard will be accounted for on the mainstream code
> since I can't possibly patch every single thing that lives on back-ports 
> for it(I personally look at CentOS world rn),
> while it might be the end user of my target setups.
> My territory is mainly the target and this is not the issue I can fix on 
> my side.

Again, there is no issue here that I see. The only issue I see right now
is that nvmet is allowing namespaces to change anagrpid while telling
the host that it won't change.

>  > Could be... We'll need to see patches.
> 
> On that regard, I have seen plenty of git-related mails around here,
> so would it be possible to publish patches as a few commits based on 
> mainline or infradead git repo on GitHub or something?

Not really.

> Or is it mandatory to go, no offense, the old-fashioned way of sending 
> patch files as attachments or text?

No attachments please, follow the instructions in:
Documentation/process/submitting-patches.rst



More information about the Linux-nvme mailing list