[PATCH] nvmet-tcp: avoid circular locking dependency on install_queue()

Thu Aug 17 04:43:29 PDT 2023

>>>>> nvmet_tcp_install_queue() is driven from the ->io_work workqueue
>>>>> function, but will call flush_workqueue() which might trigger
>>>>> ->release_work() which in itself calls flush_work on ->io_work.
>>>>>
>>>>> To avoid that check for pending queue in disconnecting status,
>>>>> and return 'controller busy' until all disconnects are completed.
>>>>>
>>>>> Signed-off-by: Hannes Reinecke <hare at suse.de>
>>>>> ---
>>>>>   drivers/nvme/target/tcp.c | 14 ++++++++++++--
>>>>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
>>>>> index f131b504aa01..97d07488072d 100644
>>>>> --- a/drivers/nvme/target/tcp.c
>>>>> +++ b/drivers/nvme/target/tcp.c
>>>>> @@ -1845,8 +1845,18 @@ static u16 nvmet_tcp_install_queue(struct 
>>>>> nvmet_sq *sq)
>>>>>           container_of(sq, struct nvmet_tcp_queue, nvme_sq);
>>>>>       if (sq->qid == 0) {
>>>>> -        /* Let inflight controller teardown complete */
>>>>> -        flush_workqueue(nvmet_wq);
>>>>> +        struct nvmet_tcp_queue *q;
>>>>> +        int pending = 0;
>>>>> +
>>>>> +        mutex_lock(&nvmet_tcp_queue_mutex);
>>>>> +        list_for_each_entry(q, &nvmet_tcp_queue_list, queue_list) {
>>>>> +            if (q->state == NVMET_TCP_Q_DISCONNECTING)
>>>>> +                pending++;
>>>>> +        }
>>>>> +        mutex_unlock(&nvmet_tcp_queue_mutex);
>>>>> +        /* Retry for pending controller teardown */
>>>>> +        if (pending)
>>>>> +            return NVME_SC_CONNECT_CTRL_BUSY;
>>>>
>>>> Overall I'm fine with the approach.
>>>>
>>>> That will most likely be the case for a controller reset or a "reset
>>>> storm". Maybe it is better to keep it at some reasonably allowed 
>>>> pending
>>>> to say 10 or something... The whole point is to not let the removals
>>>> accumulate and continue creating new ones...
>>>>
>>>> Can you please test this with a host that does a loop like:
>>>> while true; do nvme reset /dev/nvme0; done
>>>>
>>>> And let it run for some time...
>>>> Or even better create a test that does this for 100 iterations
>>>> or a log/short variants of this?
>>>>
>>>> The same would be needed by nvmet-rdma btw.
>>>
>>> Retested with the reset loop; no issues found after 327 iterations.
>>
>> Interesting, did the host see a NVME_SC_CONNECT_CTRL_BUSY  during the
>> test?
> 
> Nope.

So you are not stressing enough?

Before (5+ years ago), multiple reporters, including myself ran a
similar test and got to OOM after enough time.

If you are not exercising the busy return, you are not getting to a
point where you need to back-pressure...