[PATCH v2] nvmet: force reconnect when number of queue changes

James Smart jsmart2021 at gmail.com
Thu Oct 6 13:15:17 PDT 2022


On 10/6/2022 4:37 AM, Sagi Grimberg wrote:
> 
>>>> As far I can tell, what's is missing from a testing point of view is 
>>>> the
>>>> ability to fail requests without the DNR bit set or the ability to tell
>>>> the host to reconnect. Obviously, an AEN would be nice for this but I
>>>> don't know if this is reason enough to extend the spec.
>>>
>>> Looking into the code, its the connect that fails on invalid parameter
>>> with a DNR, because the host is attempting to connect to a subsystems
>>> that does not exist on the port (because it was taken offline for
>>> maintenance reasons).
>>>
>>> So I guess it is valid to allow queue change without removing it from
>>> the port, but that does not change the fundamental question on DNR.
>>> If the host sees a DNR error on connect, my interpretation is that the
>>> host should not retry the connect command itself, but it shouldn't imply
>>> anything on tearing down the controller and giving up on it completely,
>>> forever.
>>
>> Okay, let me try to avoid the DNR discussion for now and propose
>> something else? What about adding a 'enable' attribute to the subsys?
>>
>> The snipped below does the trick. Though There is no explicit
>> synchronization between host and target, so it's possible the host
>> doesn't notice that the subsystem toggled enabled and updated the number
>> queues. But not sure if it's worth to address this, it feels a bit
>> over-engineered.
> 
> I think that for the matter of this patch, you can keep force reconnect.
> 
> But I still think we need to be consistent with the different transports
> on how we interperet controller returning DNR...
> 

I agree - behavior should be the same regardless of transport.  DNR is 
pretty specific in its definition - "If the same command is re-submitted 
to any controller in the NVM subsystem, then that re-submitted command 
is expected to fail"

But, what we forget is "the command" is actually "command X with fields 
set this way".  If we change the fields, it may actually succeed.

So if we're re-issuing Connect in the same way without any changing 
values, we're better off not reconencting.   But if Connect changes, 
we're back to ground 0.

-- james




More information about the Linux-nvme mailing list