[PATCH] nvmet-tcp: avoid circular locking dependency on install_queue()
Sagi Grimberg
sagi at grimberg.me
Thu Aug 17 04:43:29 PDT 2023
>>>>> nvmet_tcp_install_queue() is driven from the ->io_work workqueue
>>>>> function, but will call flush_workqueue() which might trigger
>>>>> ->release_work() which in itself calls flush_work on ->io_work.
>>>>>
>>>>> To avoid that check for pending queue in disconnecting status,
>>>>> and return 'controller busy' until all disconnects are completed.
>>>>>
>>>>> Signed-off-by: Hannes Reinecke <hare at suse.de>
>>>>> ---
>>>>> drivers/nvme/target/tcp.c | 14 ++++++++++++--
>>>>> 1 file changed, 12 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
>>>>> index f131b504aa01..97d07488072d 100644
>>>>> --- a/drivers/nvme/target/tcp.c
>>>>> +++ b/drivers/nvme/target/tcp.c
>>>>> @@ -1845,8 +1845,18 @@ static u16 nvmet_tcp_install_queue(struct
>>>>> nvmet_sq *sq)
>>>>> container_of(sq, struct nvmet_tcp_queue, nvme_sq);
>>>>> if (sq->qid == 0) {
>>>>> - /* Let inflight controller teardown complete */
>>>>> - flush_workqueue(nvmet_wq);
>>>>> + struct nvmet_tcp_queue *q;
>>>>> + int pending = 0;
>>>>> +
>>>>> + mutex_lock(&nvmet_tcp_queue_mutex);
>>>>> + list_for_each_entry(q, &nvmet_tcp_queue_list, queue_list) {
>>>>> + if (q->state == NVMET_TCP_Q_DISCONNECTING)
>>>>> + pending++;
>>>>> + }
>>>>> + mutex_unlock(&nvmet_tcp_queue_mutex);
>>>>> + /* Retry for pending controller teardown */
>>>>> + if (pending)
>>>>> + return NVME_SC_CONNECT_CTRL_BUSY;
>>>>
>>>> Overall I'm fine with the approach.
>>>>
>>>> That will most likely be the case for a controller reset or a "reset
>>>> storm". Maybe it is better to keep it at some reasonably allowed
>>>> pending
>>>> to say 10 or something... The whole point is to not let the removals
>>>> accumulate and continue creating new ones...
>>>>
>>>> Can you please test this with a host that does a loop like:
>>>> while true; do nvme reset /dev/nvme0; done
>>>>
>>>> And let it run for some time...
>>>> Or even better create a test that does this for 100 iterations
>>>> or a log/short variants of this?
>>>>
>>>> The same would be needed by nvmet-rdma btw.
>>>
>>> Retested with the reset loop; no issues found after 327 iterations.
>>
>> Interesting, did the host see a NVME_SC_CONNECT_CTRL_BUSY during the
>> test?
>
> Nope.
So you are not stressing enough?
Before (5+ years ago), multiple reporters, including myself ran a
similar test and got to OOM after enough time.
If you are not exercising the busy return, you are not getting to a
point where you need to back-pressure...
More information about the Linux-nvme
mailing list