blktests failures with v6.4

Shinichiro Kawasaki shinichiro.kawasaki at wdc.com
Wed Jul 12 18:22:32 PDT 2023


On Jul 09, 2023 / 17:32, Sagi Grimberg wrote:
> 
> > #3: nvme/003 (fabrics transport)
> > 
> >     When nvme test group is run with trtype=rdma or tcp, the test case fails
> >     due to lockdep WARNING "possible circular locking dependency detected".
> >     Reported in May/2023. Bart suggested a fix for trytpe=rdma [4] but it
> >     needs more discussion.
> > 
> >     [4] https://lore.kernel.org/linux-nvme/20230511150321.103172-1-bvanassche@acm.org/
> 
> This patch is unfortunately incorrect and buggy.
> 
> This will likely make the issue go away, but adds another
> old issue where a client can DDOS a target by bombarding it
> with connect/disconnect. When releases are async and we don't
> have any back-pressure, it is likely to happen.
> --
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index 4597bca43a6d..8b4f4aa48206 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -1582,11 +1582,6 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id
> *cm_id,
>                 goto put_device;
>         }
> 
> -       if (queue->host_qid == 0) {
> -               /* Let inflight controller teardown complete */
> -               flush_workqueue(nvmet_wq);
> -       }
> -
>         ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
>         if (ret) {
>                 /*
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 868aa4de2e4c..c8cfa19e11c7 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -1844,11 +1844,6 @@ static u16 nvmet_tcp_install_queue(struct nvmet_sq
> *sq)
>         struct nvmet_tcp_queue *queue =
>                 container_of(sq, struct nvmet_tcp_queue, nvme_sq);
> 
> -       if (sq->qid == 0) {
> -               /* Let inflight controller teardown complete */
> -               flush_workqueue(nvmet_wq);
> -       }
> -
>         queue->nr_cmds = sq->size * 2;
>         if (nvmet_tcp_alloc_cmds(queue))
>                 return NVME_SC_INTERNAL;
> --

Thanks Sagi, I tried the patch above and confirmed the lockdep WARN disappears
for both rdma and tcp. It indicates that the flush_workqueue(nvmet_wq)
introduced the circular lock dependency. I also found the two commits below
record why the flush_workqueue(nvmet_wq) was introduced.

 777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown")
 8832cf922151 ("nvmet: use a private workqueue instead of the system workqueue")

The left question is how to avoid both the connect/disconnect bombarding DDOS
and the circular lock possibility related to the nvmet_wq completion.


More information about the Linux-nvme mailing list