[PATCH v2] nvmet: force reconnect when number of queue changes

Wed Sep 28 08:01:00 PDT 2022

Fred, what's the difference between a "reset" and a "connection loss".

As Daniel explained, the problem case in question involves a controller that "disconnects" from the network, and then reconnects 
after a HA takeover/giveback event. And the fact is, this controller is changing the number of IOQs after the 
"disconnect/reconnect".

So this is one place where I think we conflate NVMe-oF with NVMe in the specification.  If "reset" == "disconnect", then 
Daniel's patches just implemented support for something which is forbidden by the spec - a controller that changes the number of 
IOQs between resets.

/John

5.27.1.5 Number of Queues (Feature Identifier 07h)

This Feature indicates the number of queues that the host requests for the controller processing the
command. This feature shall only be issued during initialization prior to creation of any I/O Submission
and/or Completion Queues. If a Set Features command is issued for this feature after creation of any I/O
Submission and/or I/O Completion Queues, then the Set Features command shall abort with status code
of Command Sequence Error. The controller shall not change the value allocated between resets. For a
Set Features command, the attributes are specified in Command Dword 11 (refer to Figure 322). For a Get
Features command, Dword 11 is ignored.

On 9/28/22 08:39, Knight, Frederick wrote:
> Would this be a case for a new AEN - controller configuration changed?  I'm also wondering exactly what changed in the controller?  It can't be the "Number of Queues" feature (that can't change - The controller shall not change the value allocated between resets.).  Is it the MQES field in the CAP property that changes (queue size)?
> 
> We already have change reporting for: Namespace attribute, Predictable Latency, LBA status, EG aggregate, Zone descriptor, Discovery Log, Reservations. We've questioned whether we need a Controller Attribute Changed.
> 
> Would this be a case for an exception?  Does the DNR bit apply only to commands sent on queues that already exist (i.e., NOT the connect command since that command is actually creating the queue)?  FWIW - I don't like exceptions.
> 
> Can you elaborate on exactly what is changing?
> 
> 	Fred
> 
>> -----Original Message-----
>> From: Daniel Wagner <dwagner at suse.de>
>> Sent: Wednesday, September 28, 2022 5:03 AM
>> To: Sagi Grimberg <sagi at grimberg.me>
>> Cc: linux-nvme at lists.infradead.org; Shinichiro Kawasaki
>> <shinichiro.kawasaki at wdc.com>; hare at suse.de; Knight, Frederick
>> <Frederick.Knight at netapp.com>
>> Subject: Re: [PATCH v2] nvmet: force reconnect when number of queue
>> changes
>>
>> NetApp Security WARNING: This is an external email. Do not click links or
>> open attachments unless you recognize the sender and know the content is
>> safe.
>>
>>
>>
>>
>> On Wed, Sep 28, 2022 at 11:31:43AM +0300, Sagi Grimberg wrote:
>>>
>>>>>> In order to be able to test queue number changes we need to make
>>>>>> sure that the host reconnects.
>>>>>>
>>>>>> The initial idea was to disable and re-enable the ports and have
>>>>>> the host to wait until the KATO timer expires and enter error
>>>>>> recovery. But in this scenario the host could see DNR for a
>>>>>> connection attempt which results in the host dropping the connection
>> completely.
>>>>>>
>>>>>> We can force to reconnect the host by deleting all controllers
>>>>>> connected to subsystem, which results the host observing a
>>>>>> failing command and tries to reconnect.
>>>>>
>>>>> This looks like a change that attempts to fix a host issue from
>>>>> the target side... Why do we want to do that?
>>>>
>>>> It's not a host issue at all. The scenario I'd like to test a when
>>>> target changes this property while the host is connected (e.g.
>>>> software updated -> new configuration). I haven't found a way to
>>>> signal the host to reset/reconnect from the target. Hannes suggested
>>>> to delete all controllers from the given subsystem which will
>>>> trigger the recovery process on the host on the next request. This makes
>> this test work.
>>>
>>> But that is exactly like doing:
>>> - remove subsystem from port
>>> - apply q count change
>>> - link subsystem to port
>>>
>>> Your problem is that the target returns an error code that makes the
>>> host to never reconnect. That is a host behavior, and that behavior is
>>> different from each transport used.
>>
>> Yes, I try to avoid to trigger the DNR.
>>
>>> This is why I'm not clear on weather this is the right place to
>>> address this issue.
>>>
>>> I personally do not understand why a DNR completion makes the host
>>> choose to not reconnect. DNR means "do not retry" for the command
>>> itself (which the host adheres to), and it does not have any meaning
>>> to a reset/reconnect logic.
>>
>> I am just the messenger: Besides Hannes' objection in the last mail thread, I
>> got this private reply from Fred Knight:
>>
>>     Do Not Retry (DNR): If set to ‘1’, indicates that if the same command is
>>     re-submitted to any controller in the NVM subsystem, then that
>>     re-submitted command is expected to fail. If cleared to ‘0’, indicates
>>     that the same command may succeed if retried. If a command is aborted
>>     due to time limited error recovery (refer to the Error Recovery section
>>     in the NVM Command Set Specification), this bit should be cleared to
>>     ‘0’. If the SCT and SC fields are cleared to 0h, then this bit should be
>>     cleared to ‘0’.a
>>
>>     It simply makes NO SENSE to retry that command. If the device wants the
>>     host to retry, then it will clear DNR=0.
>>
>>> In my mind, a possible use-case is that a subsystem can be un-exported
>>> from a port for maintenance reasons, and rely on the host to
>>> periodically attempt to reconnect, and this is exactly what your test
>>> is doing.
>>
>> Yes and that's the indented test case. The number of queue change is on top
>> of this scenario. It's a combined test case.
>>
>>>> Though if you have a better idea how to signal the host to
>>>> reconfigure itself, I am glad to work on it.
>>>
>>> I think we should first agree on what the host should/shoudn't do and
>>> make the logic consistent between all transports. Then we can talk
>>> about how to write a test for your test case.
>>
>> Fair enough. This here was just my cheesy attempt to get things moving.