[PATCH v2] nvmet: force reconnect when number of queue changes

Wed Sep 28 01:31:43 PDT 2022

>>> In order to be able to test queue number changes we need to make sure
>>> that the host reconnects.
>>>
>>> The initial idea was to disable and re-enable the ports and have the
>>> host to wait until the KATO timer expires and enter error
>>> recovery. But in this scenario the host could see DNR for a connection
>>> attempt which results in the host dropping the connection completely.
>>>
>>> We can force to reconnect the host by deleting all controllers
>>> connected to subsystem, which results the host observing a failing
>>> command and tries to reconnect.
>>
>> This looks like a change that attempts to fix a host issue from the
>> target side... Why do we want to do that?
> 
> It's not a host issue at all. The scenario I'd like to test a when
> target changes this property while the host is connected (e.g. software
> updated -> new configuration). I haven't found a way to signal the host
> to reset/reconnect from the target. Hannes suggested to delete all
> controllers from the given subsystem which will trigger the recovery
> process on the host on the next request. This makes this test work.

But that is exactly like doing:
- remove subsystem from port
- apply q count change
- link subsystem to port

Your problem is that the target returns an error code that makes the
host to never reconnect. That is a host behavior, and that behavior is
different from each transport used.

This is why I'm not clear on weather this is the right place to
address this issue.

I personally do not understand why a DNR completion makes the host
choose to not reconnect. DNR means "do not retry" for the command
itself (which the host adheres to), and it does not have any meaning to
a reset/reconnect logic.

In my mind, a possible use-case is that a subsystem can be un-exported
from a port for maintenance reasons, and rely on the host to
periodically attempt to reconnect, and this is exactly what your test is
doing.

> Though if you have a better idea how to signal the host to reconfigure
> itself, I am glad to work on it.

I think we should first agree on what the host should/shoudn't do and
make the logic consistent between all transports. Then we can talk about
how to write a test for your test case.