[PATCH] nvmet-rdma: Avoid o(n^2) loop in delete_ctrl

Mon May 6 00:36:11 PDT 2024

On 06/05/2024 8:50, Christoph Hellwig wrote:
> On Sun, May 05, 2024 at 01:39:44PM +0300, Sagi Grimberg wrote:
>> From: Sagi Grimberg <sagi.grimberg at vastdata.com>
>>
>> When deleting a nvmet-rdma ctrl, we essentially loop over all
>> queues that belong to the controller and schedule a removal of
>> each. Instead of restarting the loop every time a queue is found,
>> do a simple safe list traversal.
>>
>> This addresses an unneeded time spent scheduling queue removal in
>> cases there a lot of queues.
> I think the original reason for this was to avoid lock order dependencies
> and/or deadlocks, I wish I would have documented that better.  Looking at
> the current version __nvmet_rdma_queue_disconnect I can't find any
> obvious problem, but rdma_disconnect is a bit of a block box from
> the driver POV.

Yes rdma_disconnect is a black box (essentially move the qp to err state 
and send
a cm disconnect request/response). But it is not dependent on
nvmet_rdma_queue_mutex. It is true that in the cm handler we may take this
lock, but that handler has its own context.

The same pattern is used in nvmet_rdma_destroy_port_queues() and
nvmet_rdma_remove_one()

>   Did you test this extensively with lockdep enabled?

I can't say this is extensively tested. I can run blktests and make sure 
lockdep
is enabled.

>
>> +	list_for_each_entry_safe(queue, tmp, &nvmet_rdma_queue_list, queue_list) {
> Nit: overly long line here.  Maybe just rename tmp to n?

Sure.