[PATCH] nvmet-rdma: Release connections synchronously
Sagi Grimberg
sagi at grimberg.me
Sun May 21 23:41:13 PDT 2023
>>> @@ -1582,11 +1566,6 @@ static int nvmet_rdma_queue_connect(struct
>>> rdma_cm_id *cm_id,
>>> goto put_device;
>>> }
>>> - if (queue->host_qid == 0) {
>>> - /* Let inflight controller teardown complete */
>>> - flush_workqueue(nvmet_wq);
>>> - }
>>> -
>>> ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
>>> if (ret) {
>>> /*
>>
>> You could have simply removed this hunk alone to make lockdep quiet on
>> this, without the need to rework the async queue removal.
>>
>> The flush here was added to prevent a reset/connect/disconnect storm
>> causing the target to run out of resources (which we have seen reports
>> about in the distant past). What prevents it now?
>>
>> And you both reworked the teardown, and still removed the flush, I don't
>> get why both are needed.
>
> Hi Sagi,
>
> My understanding is that the above flush_workqueue() call waits for
> prior release_work to complete. If the release_work instance is removed,
> I don't think that the above flush_workqueue() call is still necessary.
I'm wandering if making delete_ctrl synchronous may be a problem
in some cases, nvmet_fatal_error can be triggered from the
rdma completion path (rdma core workqueue context). Wondering if there
may be some inter-dependency there...
Also, nvmet-tcp has a similar lockdep complaint, where the teardown goes
via socket shutdown, which has to be async because we cannot release the
socket from the state_change callback.
More information about the Linux-nvme
mailing list