[PATCH] nvmet-tcp: avoid circular locking dependency on install_queue()

Hannes Reinecke hare at suse.de
Thu Aug 17 04:11:54 PDT 2023


On 8/17/23 10:54, Sagi Grimberg wrote:
> 
> 
> On 8/10/23 16:20, Hannes Reinecke wrote:
>> nvmet_tcp_install_queue() is driven from the ->io_work workqueue
>> function, but will call flush_workqueue() which might trigger
>> ->release_work() which in itself calls flush_work on ->io_work.
>>
>> To avoid that check for pending queue in disconnecting status,
>> and return 'controller busy' until all disconnects are completed.
>>
>> Signed-off-by: Hannes Reinecke <hare at suse.de>
>> ---
>>   drivers/nvme/target/tcp.c | 14 ++++++++++++--
>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
>> index f131b504aa01..97d07488072d 100644
>> --- a/drivers/nvme/target/tcp.c
>> +++ b/drivers/nvme/target/tcp.c
>> @@ -1845,8 +1845,18 @@ static u16 nvmet_tcp_install_queue(struct 
>> nvmet_sq *sq)
>>           container_of(sq, struct nvmet_tcp_queue, nvme_sq);
>>       if (sq->qid == 0) {
>> -        /* Let inflight controller teardown complete */
>> -        flush_workqueue(nvmet_wq);
>> +        struct nvmet_tcp_queue *q;
>> +        int pending = 0;
>> +
>> +        mutex_lock(&nvmet_tcp_queue_mutex);
>> +        list_for_each_entry(q, &nvmet_tcp_queue_list, queue_list) {
>> +            if (q->state == NVMET_TCP_Q_DISCONNECTING)
>> +                pending++;
>> +        }
>> +        mutex_unlock(&nvmet_tcp_queue_mutex);
>> +        /* Retry for pending controller teardown */
>> +        if (pending)
>> +            return NVME_SC_CONNECT_CTRL_BUSY;
> 
> Overall I'm fine with the approach.
> 
> That will most likely be the case for a controller reset or a "reset
> storm". Maybe it is better to keep it at some reasonably allowed pending
> to say 10 or something... The whole point is to not let the removals
> accumulate and continue creating new ones...
> 
> Can you please test this with a host that does a loop like:
> while true; do nvme reset /dev/nvme0; done
> 
> And let it run for some time...
> Or even better create a test that does this for 100 iterations
> or a log/short variants of this?
> 
> The same would be needed by nvmet-rdma btw.

Retested with the reset loop; no issues found after 327 iterations.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare at suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman




More information about the Linux-nvme mailing list