[PATCH 1/2] nvmet-rdma: avoid circular locking dependency on install_queue()

Mon Dec 4 03:49:14 PST 2023

On 12/4/23 11:19, Sagi Grimberg wrote:
> 
> 
> On 11/20/23 15:48, Sagi Grimberg wrote:
>>
>>>> According to 777dc82395de ("nvmet-rdma: occasionally flush ongoing
>>>> controller teardown") this is just for reducing the memory footprint.
>>>> Wonder if we need to bother, and whether it won't be better to remove
>>>> the whole thing entirely.
>>>
>>> Well, Sagi added it, so I'll let him chime in on the usefulness.
>>
>> While I don't like having nvmet arbitrarily replying busy and instead
>> have lockdep simply just accept that its not a deadlock here, but we can
>> simply just sidetrack it as proposed I guess.
>>
>> But Hannes, this is on the other extreme.. Now every connect that nvmet
>> gets, if there is even a single queue that is disconnecting (global
>> scope), then the host is denied. Lets give it a sane backlog.
>> We use rdma_listen backlog of 128, so maybe stick with this magic
>> number... This way we are busy only if more than 128 queues are tearing
>> down to prevent the memory footprint from exploding, and hopefully it is
>> rare enough that the normal host does not see an arbitrary busy
>> rejection.
>>
>> Same comment for nvmet-tcp.
> 
> Hey Hannes, anything happened with this one?
> 
> Overall I think that the approach is fine, but I do think
> that we need a backlog for it.

Hmm. The main issue here is the call to 'flush_workqueue()', which 
triggers the circular lock warning. So a ratelimit would only help
us so much; the real issue is to get rid of the flush_workqueue()
thingie.

What I can to is to add this:

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 4cc27856aa8f..72bcc54701a0 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -2119,8 +2119,20 @@ static u16 nvmet_tcp_install_queue(struct 
nvmet_sq *sq)
                 container_of(sq, struct nvmet_tcp_queue, nvme_sq);

         if (sq->qid == 0) {
+               struct nvmet_tcp_queue *q;
+               int pending = 0;
+
                 /* Let inflight controller teardown complete */
-               flush_workqueue(nvmet_wq);
+               mutex_lock(&nvmet_tcp_queue_mutex);
+               list_for_each_entry(q, &nvmet_tcp_queue_list, queue_list) {
+                       if (q->nvme_sq.ctrl == sq->ctrl &&
+                           q->state == NVMET_TCP_Q_DISCONNECTING)
+                               pending++;
+               }
+               mutex_unlock(&nvmet_tcp_queue_mutex);
+               /* Retry for pending controller teardown */
+               if (pending)
+                       return NVME_SC_CONNECT_CTRL_BUSY;
         }

which then would only affect the controller we're connecting to.
Hmm?

Cheers,

Hannes