[PATCH 1/2] nvmet-rdma: avoid circular locking dependency on install_queue()

Sagi Grimberg sagi at grimberg.me
Mon Dec 4 03:57:32 PST 2023



On 12/4/23 13:49, Hannes Reinecke wrote:
> On 12/4/23 11:19, Sagi Grimberg wrote:
>>
>>
>> On 11/20/23 15:48, Sagi Grimberg wrote:
>>>
>>>>> According to 777dc82395de ("nvmet-rdma: occasionally flush ongoing
>>>>> controller teardown") this is just for reducing the memory footprint.
>>>>> Wonder if we need to bother, and whether it won't be better to remove
>>>>> the whole thing entirely.
>>>>
>>>> Well, Sagi added it, so I'll let him chime in on the usefulness.
>>>
>>> While I don't like having nvmet arbitrarily replying busy and instead
>>> have lockdep simply just accept that its not a deadlock here, but we can
>>> simply just sidetrack it as proposed I guess.
>>>
>>> But Hannes, this is on the other extreme.. Now every connect that nvmet
>>> gets, if there is even a single queue that is disconnecting (global
>>> scope), then the host is denied. Lets give it a sane backlog.
>>> We use rdma_listen backlog of 128, so maybe stick with this magic
>>> number... This way we are busy only if more than 128 queues are tearing
>>> down to prevent the memory footprint from exploding, and hopefully it is
>>> rare enough that the normal host does not see an arbitrary busy
>>> rejection.
>>>
>>> Same comment for nvmet-tcp.
>>
>> Hey Hannes, anything happened with this one?
>>
>> Overall I think that the approach is fine, but I do think
>> that we need a backlog for it.
> 
> Hmm. The main issue here is the call to 'flush_workqueue()', which 
> triggers the circular lock warning. So a ratelimit would only help
> us so much; the real issue is to get rid of the flush_workqueue()
> thingie.
> 
> What I can to is to add this:
> 
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 4cc27856aa8f..72bcc54701a0 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -2119,8 +2119,20 @@ static u16 nvmet_tcp_install_queue(struct 
> nvmet_sq *sq)
>                  container_of(sq, struct nvmet_tcp_queue, nvme_sq);
> 
>          if (sq->qid == 0) {
> +               struct nvmet_tcp_queue *q;
> +               int pending = 0;
> +
>                  /* Let inflight controller teardown complete */
> -               flush_workqueue(nvmet_wq);
> +               mutex_lock(&nvmet_tcp_queue_mutex);
> +               list_for_each_entry(q, &nvmet_tcp_queue_list, queue_list) {
> +                       if (q->nvme_sq.ctrl == sq->ctrl &&
> +                           q->state == NVMET_TCP_Q_DISCONNECTING)
> +                               pending++;
> +               }
> +               mutex_unlock(&nvmet_tcp_queue_mutex);
> +               /* Retry for pending controller teardown */
> +               if (pending)
> +                       return NVME_SC_CONNECT_CTRL_BUSY;
>          }
> 
> which then would only affect the controller we're connecting to.
> Hmm?

Still I think we should give a reasonable backlog, no reason to limit
this as we may hit this more often than we'd like and the sole purpose
here is to avoid memory overrun.



More information about the Linux-nvme mailing list