[PATCH] nvmet-tcp: avoid circular locking dependency on install_queue()
Hannes Reinecke
hare at suse.de
Thu Aug 17 04:11:54 PDT 2023
On 8/17/23 10:54, Sagi Grimberg wrote:
>
>
> On 8/10/23 16:20, Hannes Reinecke wrote:
>> nvmet_tcp_install_queue() is driven from the ->io_work workqueue
>> function, but will call flush_workqueue() which might trigger
>> ->release_work() which in itself calls flush_work on ->io_work.
>>
>> To avoid that check for pending queue in disconnecting status,
>> and return 'controller busy' until all disconnects are completed.
>>
>> Signed-off-by: Hannes Reinecke <hare at suse.de>
>> ---
>> drivers/nvme/target/tcp.c | 14 ++++++++++++--
>> 1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
>> index f131b504aa01..97d07488072d 100644
>> --- a/drivers/nvme/target/tcp.c
>> +++ b/drivers/nvme/target/tcp.c
>> @@ -1845,8 +1845,18 @@ static u16 nvmet_tcp_install_queue(struct
>> nvmet_sq *sq)
>> container_of(sq, struct nvmet_tcp_queue, nvme_sq);
>> if (sq->qid == 0) {
>> - /* Let inflight controller teardown complete */
>> - flush_workqueue(nvmet_wq);
>> + struct nvmet_tcp_queue *q;
>> + int pending = 0;
>> +
>> + mutex_lock(&nvmet_tcp_queue_mutex);
>> + list_for_each_entry(q, &nvmet_tcp_queue_list, queue_list) {
>> + if (q->state == NVMET_TCP_Q_DISCONNECTING)
>> + pending++;
>> + }
>> + mutex_unlock(&nvmet_tcp_queue_mutex);
>> + /* Retry for pending controller teardown */
>> + if (pending)
>> + return NVME_SC_CONNECT_CTRL_BUSY;
>
> Overall I'm fine with the approach.
>
> That will most likely be the case for a controller reset or a "reset
> storm". Maybe it is better to keep it at some reasonably allowed pending
> to say 10 or something... The whole point is to not let the removals
> accumulate and continue creating new ones...
>
> Can you please test this with a host that does a loop like:
> while true; do nvme reset /dev/nvme0; done
>
> And let it run for some time...
> Or even better create a test that does this for 100 iterations
> or a log/short variants of this?
>
> The same would be needed by nvmet-rdma btw.
Retested with the reset loop; no issues found after 327 iterations.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman
More information about the Linux-nvme
mailing list