[PATCH v2] nvmet: force reconnect when number of queue changes
James Smart
jsmart2021 at gmail.com
Thu Oct 6 13:15:17 PDT 2022
On 10/6/2022 4:37 AM, Sagi Grimberg wrote:
>
>>>> As far I can tell, what's is missing from a testing point of view is
>>>> the
>>>> ability to fail requests without the DNR bit set or the ability to tell
>>>> the host to reconnect. Obviously, an AEN would be nice for this but I
>>>> don't know if this is reason enough to extend the spec.
>>>
>>> Looking into the code, its the connect that fails on invalid parameter
>>> with a DNR, because the host is attempting to connect to a subsystems
>>> that does not exist on the port (because it was taken offline for
>>> maintenance reasons).
>>>
>>> So I guess it is valid to allow queue change without removing it from
>>> the port, but that does not change the fundamental question on DNR.
>>> If the host sees a DNR error on connect, my interpretation is that the
>>> host should not retry the connect command itself, but it shouldn't imply
>>> anything on tearing down the controller and giving up on it completely,
>>> forever.
>>
>> Okay, let me try to avoid the DNR discussion for now and propose
>> something else? What about adding a 'enable' attribute to the subsys?
>>
>> The snipped below does the trick. Though There is no explicit
>> synchronization between host and target, so it's possible the host
>> doesn't notice that the subsystem toggled enabled and updated the number
>> queues. But not sure if it's worth to address this, it feels a bit
>> over-engineered.
>
> I think that for the matter of this patch, you can keep force reconnect.
>
> But I still think we need to be consistent with the different transports
> on how we interperet controller returning DNR...
>
I agree - behavior should be the same regardless of transport. DNR is
pretty specific in its definition - "If the same command is re-submitted
to any controller in the NVM subsystem, then that re-submitted command
is expected to fail"
But, what we forget is "the command" is actually "command X with fields
set this way". If we change the fields, it may actually succeed.
So if we're re-issuing Connect in the same way without any changing
values, we're better off not reconencting. But if Connect changes,
we're back to ground 0.
-- james
More information about the Linux-nvme
mailing list