[PATCH 2/2] nvmet-rdma: avoid circular locking dependency on install_queue()
Sagi Grimberg
sagi at grimberg.me
Sun Jun 2 23:43:38 PDT 2024
On 02/06/2024 16:09, Max Gurtovoy wrote:
> Hi Hannes/Sagi,
>
> sorry for the late response on this one.
>
> On 08/12/2023 14:53, hare at kernel.org wrote:
>> From: Hannes Reinecke<hare at suse.de>
>>
>> nvmet_rdma_install_queue() is driven from the ->io_work workqueue
>
> nvmet_rdma_install_queue callback is not implemented in the driver.
> And RDMA doesn't use io_work workqueue..
>
>
>> function, but will call flush_workqueue() which might trigger
>> ->release_work() which in itself calls flush_work on ->io_work.
>>
>> To avoid that check for pending queue in disconnecting status,
>> and return 'controller busy' when we reached a certain threshold.
>>
>> Signed-off-by: Hannes Reinecke<hare at suse.de>
>> Tested-by: Shin'ichiro Kawasaki<shinichiro.kawasaki at wdc.com>
>> ---
>> drivers/nvme/target/rdma.c | 19 ++++++++++++++++---
>> 1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
>> index 4597bca43a6d..667f9c04f35d 100644
>> --- a/drivers/nvme/target/rdma.c
>> +++ b/drivers/nvme/target/rdma.c
>> @@ -37,6 +37,8 @@
>> #define NVMET_RDMA_MAX_MDTS 8
>> #define NVMET_RDMA_MAX_METADATA_MDTS 5
>> +#define NVMET_RDMA_BACKLOG 128
>> +
>> struct nvmet_rdma_srq;
>> struct nvmet_rdma_cmd {
>> @@ -1583,8 +1585,19 @@ static int nvmet_rdma_queue_connect(struct
>> rdma_cm_id *cm_id,
>> }
>> if (queue->host_qid == 0) {
>> - /* Let inflight controller teardown complete */
>> - flush_workqueue(nvmet_wq);
>> + struct nvmet_rdma_queue *q;
>> + int pending = 0;
>> +
>> + /* Check for pending controller teardown */
>> + mutex_lock(&nvmet_rdma_queue_mutex);
>> + list_for_each_entry(q, &nvmet_rdma_queue_list, queue_list) {
>> + if (q->nvme_sq.ctrl == queue->nvme_sq.ctrl &&
>
> nvme_sq->ctrl pointer is set upon FABRICS_CONNECT during
> nvmet_install_queue().
>
> obviously this check is not relevant since queue->nvme_sq.ctrl is
> always NULL, isn't it ?
I think that the goal was to look at all the queues that were scheduled
for teardown (queue->release_work)
so sq->ctrl should be valid for those queues afaict.
>
> So, I wonder what does this check is doing ?
>
> Was the intention to check that port->cm_id (listener) is not
> overloaded ?
>
> Also the return value is "FABRICS" return code and not the value that
> rdma_cm is looking for.
>
> We probably should introduce nvmet_rdma_install_queue() and check the
> status of pending disconnections associated with port->cm_it which is
> the one with the configured backlog.
>
> I don't understand the solution also for the nvmet_tcp.
>
> I think this series only removed the "flush_workqueue(nvmet_wq);"
> actually.
>
> Is there something I miss in this commit ?
IIRC the goal was to get rid of the circular locking lockdep complaint.
nvmet needs a back-pressure
mechanism for incoming connections because queue teardown is async and
nvmet may hit OOM in
connect-disconnect storm scenario. The goal here was to detect such
scenario, by counting the number
of queues that are queued for teardown, and if they exceed a certain
threshold, return a BUSY status
to the incoming fabrics connect.
Before, nvmet flushed the pending queue teardowns, but that triggered a
lockdep complaint and
hannes attempted to address it.
>
>> + q->state == NVMET_RDMA_Q_DISCONNECTING)
>> + pending++;
>> + }
>> + mutex_unlock(&nvmet_rdma_queue_mutex);
>> + if (pending > NVMET_RDMA_BACKLOG)
>> + return NVME_SC_CONNECT_CTRL_BUSY;
>> }
>> ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
>> @@ -1880,7 +1893,7 @@ static int nvmet_rdma_enable_port(struct
>> nvmet_rdma_port *port)
>> goto out_destroy_id;
>> }
>> - ret = rdma_listen(cm_id, 128);
>> + ret = rdma_listen(cm_id, NVMET_RDMA_BACKLOG);
>> if (ret) {
>> pr_err("listening to %pISpcs failed (%d)\n", addr, ret);
>> goto out_destroy_id;
More information about the Linux-nvme
mailing list