[PATCH 12/17] nvme-fabrics: reset connection for secure concatenation

Wed May 8 02:21:49 PDT 2024

On 4/21/24 13:20, Sagi Grimberg wrote:
> 
> 
> On 09/04/2024 1:08, Hannes Reinecke wrote:
>> On 4/8/24 23:21, Sagi Grimberg wrote:
>>>
>>>
>>> On 08/04/2024 9:21, Hannes Reinecke wrote:
>>>> On 4/7/24 23:46, Sagi Grimberg wrote:
>>>>>
>>>>>
>>>>> On 18/03/2024 17:03, Hannes Reinecke wrote:
>>>>>> When secure concatenation is requested the connection needs to be
>>>>>> reset to enable TLS encryption on the new cnnection.
>>>>>> That implies that the original connection used for the DH-CHAP
>>>>>> negotiation really shouldn't be used, and we should reset as soon
>>>>>> as the DH-CHAP negotiation has succeeded on the admin queue.
>>>>>> The current implementation does not allow to easily skip
>>>>>> connection attempts on the I/O queues, so we connect I/O
>>>>>> queues, but disable namespace scanning on these queues.
>>>>>> With that no I/O can be issued on these queues, so we
>>>>>> can tear them down quickly without having to wait for
>>>>>> quiescing etc.
>>>>>
>>>>> We shouldn't have to connect io queues here. The scan prevention
>>>>> is just a hack...
>>>>>
>>>> Oh, believe me, I tried. What _should_ be possible is to create a
>>>> controller with just admin queues, and skip the io queue setup part.
>>>> But once you do that it's impossible to create io queues after reset
>>>> (which is what you'd need here).
>>>> I tried to fix this, but that ended up with a massive 
>>>> rewrite/reordering
>>>> of the init path. Which sure has its merits, but I deemed out of scope
>>>> for this patchset.
>>>>
>>>> So I decided for this 'hack', and shelved the init path reorg for a
>>>> later patchset.
>>>
>>> Interesting that it requires a full rewrite. Can you share some info
>>> what particularly breaks when you change the nr_io_queues between
>>> resets? nr_io_queues can be changed across resets from the controller 
>>> side
>>> as well.
>>>
>> Just look at the 'new' parameter to nvme_tcp_configure_io_queues(),
>> and try to modify the code to _not_ required the parameter.
> 
> What happens if you set ctrl->queue_count to 0 and then restore it?
> 
That way lays madness.

The tagset is allocated only _once_ when nvme_tcp_configure_io_queues()
is called with new == true (ie during the initial connect call).
So we cannot just skip the call and delegate it to reset as then the 
tagset will never be allocated.

And we have to call 'nvme_tcp_alloc_io_queues()' before the tagset is
allocated, as nvme_alloc_io_tag_set() recurses into 
nvme_tcp_init_request(), which requires the queues to be allocated.

So the only 'simple' choice is to not call nvme_tcp_start_io_queues(),
but then all queues are already connected and we're having to tear them
down _anyway_.

Essentially we will just skip sending the 'connect' command for the I/O
queues, everything else will have to stay the same.
Which we can, but then we're not saving that much here.

>> Key observation here is that the _size_ of the I/O tagset is actually
>> defined by the parameters for the connect command; we cannot grow
>> beyond that. Which means that we should be allocate the I/O tagset
>> right at the start, and only modify the number of queue which we _use_
>> based on the information we get back from the controller.
> 
> What does the tagset has to do with connecting the queues?
> 
See above. allocating the tagset recurses into init_request, which 
requires the queues to be allocated.

>> Currently the io tagset allocation is tied to the number of queues
>> for the controller, and trying to modify that is an even worse hack
>> than suppressing the namespace scan.
> 
> I don't think it is... All I'm saying is that if we are connecting 64 io 
> queues
> to just right after reconnect them with tls, we are doing something wrong.
> 

No argument here.

>>
>> Anyway,it's quite a different patchset which is basically unrelated
>> to the secure concatenation work.
> 
> Well, they are required by the secure concatenation, but agree that
> they are preparatory.
> 
>>
>> I do have a preliminary patchset for this if there's interest.
> 
> Again, the fact that we connect io queues, disconnect them and then
> connect them again is backwards I think.
> 
Indeed.
But I'll guess we'll have a discussion at LSF about initial connect 
logic anyway.

Cheers,

Hannes