[PATCH 2/2] nvmet-rdma: avoid circular locking dependency on install_queue()

Sun Jun 2 23:43:38 PDT 2024

On 02/06/2024 16:09, Max Gurtovoy wrote:
> Hi Hannes/Sagi,
>
> sorry for the late response on this one.
>
> On 08/12/2023 14:53, hare at kernel.org wrote:
>> From: Hannes Reinecke<hare at suse.de>
>>
>> nvmet_rdma_install_queue() is driven from the ->io_work workqueue
>
> nvmet_rdma_install_queue callback is not implemented in the driver. 
> And RDMA doesn't use io_work workqueue..
>
>
>> function, but will call flush_workqueue() which might trigger
>> ->release_work() which in itself calls flush_work on ->io_work.
>>
>> To avoid that check for pending queue in disconnecting status,
>> and return 'controller busy' when we reached a certain threshold.
>>
>> Signed-off-by: Hannes Reinecke<hare at suse.de>
>> Tested-by: Shin'ichiro Kawasaki<shinichiro.kawasaki at wdc.com>
>> ---
>>   drivers/nvme/target/rdma.c | 19 ++++++++++++++++---
>>   1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
>> index 4597bca43a6d..667f9c04f35d 100644
>> --- a/drivers/nvme/target/rdma.c
>> +++ b/drivers/nvme/target/rdma.c
>> @@ -37,6 +37,8 @@
>>   #define NVMET_RDMA_MAX_MDTS            8
>>   #define NVMET_RDMA_MAX_METADATA_MDTS        5
>>   +#define NVMET_RDMA_BACKLOG 128
>> +
>>   struct nvmet_rdma_srq;
>>     struct nvmet_rdma_cmd {
>> @@ -1583,8 +1585,19 @@ static int nvmet_rdma_queue_connect(struct 
>> rdma_cm_id *cm_id,
>>       }
>>         if (queue->host_qid == 0) {
>> -        /* Let inflight controller teardown complete */
>> -        flush_workqueue(nvmet_wq);
>> +        struct nvmet_rdma_queue *q;
>> +        int pending = 0;
>> +
>> +        /* Check for pending controller teardown */
>> +        mutex_lock(&nvmet_rdma_queue_mutex);
>> +        list_for_each_entry(q, &nvmet_rdma_queue_list, queue_list) {
>> +            if (q->nvme_sq.ctrl == queue->nvme_sq.ctrl &&
>
> nvme_sq->ctrl pointer is set upon FABRICS_CONNECT during 
> nvmet_install_queue().
>
> obviously this check is not relevant since queue->nvme_sq.ctrl is 
> always NULL, isn't it ?

I think that the goal was to look at all the queues that were scheduled 
for teardown (queue->release_work)
so sq->ctrl should be valid for those queues afaict.

>
> So, I wonder what does this check is doing ?
>
> Was the intention to check that port->cm_id (listener) is not 
> overloaded ?
>
> Also the return value is "FABRICS" return code and not the value that 
> rdma_cm is looking for.
>
> We probably should introduce nvmet_rdma_install_queue() and check the 
> status of pending disconnections associated with port->cm_it which is 
> the one with the configured backlog.
>
> I don't understand the solution also for the nvmet_tcp.
>
> I think this series only removed the "flush_workqueue(nvmet_wq);" 
> actually.
>
> Is there something I miss in this commit ?

IIRC the goal was to get rid of the circular locking lockdep complaint. 
nvmet needs a back-pressure
mechanism for incoming connections because queue teardown is async and 
nvmet may hit OOM in
connect-disconnect storm scenario. The goal here was to detect such 
scenario, by counting the number
of queues that are queued for teardown, and if they exceed a certain 
threshold, return a BUSY status
to the incoming fabrics connect.

Before, nvmet flushed the pending queue teardowns, but that triggered a 
lockdep complaint and
hannes attempted to address it.

>
>> +                q->state == NVMET_RDMA_Q_DISCONNECTING)
>> +                pending++;
>> +        }
>> +        mutex_unlock(&nvmet_rdma_queue_mutex);
>> +        if (pending > NVMET_RDMA_BACKLOG)
>> +            return NVME_SC_CONNECT_CTRL_BUSY;
>>       }
>>         ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
>> @@ -1880,7 +1893,7 @@ static int nvmet_rdma_enable_port(struct 
>> nvmet_rdma_port *port)
>>           goto out_destroy_id;
>>       }
>>   -    ret = rdma_listen(cm_id, 128);
>> +    ret = rdma_listen(cm_id, NVMET_RDMA_BACKLOG);
>>       if (ret) {
>>           pr_err("listening to %pISpcs failed (%d)\n", addr, ret);
>>           goto out_destroy_id;