[PATCH] nvme-rdma: fix deadlock when delete ctrl due to reconnect fail

Mon Jul 27 23:06:15 EDT 2020

On 2020/7/28 7:31, Sagi Grimberg wrote:
> 
>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>> index f8f856dc0c67..b381e2cde50a 100644
>>> --- a/drivers/nvme/host/rdma.c
>>> +++ b/drivers/nvme/host/rdma.c
>>> @@ -989,8 +989,7 @@ static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl,
>>>                   nvme_cancel_request, &ctrl->ctrl);
>>>               blk_mq_tagset_wait_completed_request(ctrl->ctrl.tagset);
>>>           }
>>> -        if (remove)
>>> -            nvme_start_queues(&ctrl->ctrl);
>>> +        nvme_start_queues(&ctrl->ctrl);
>>
>> This will fail I/O during controller reset, so nak on this.
The io will do not fail. If work with native multipath or dm-multipath,
nvme_rdma_queue_rq will return io error, and then multipath will
fail over to other path and retry io, this is we expected. If work
without multipath, nvme_rdma_queue_rq will return BLK_STS_RESOURCE,
and then the upper layer will requeue and retry. Surely there is a
weakness:the io will retry repeated every BLK_MQ_RESOURCE_DELAY(3ms)
while reconnecting. Because controller reset may need long time,
and nvme over roce is mainly used with multipath software, so when
controller reset we expect fail over to other path and retry io,, just
like error recovery. If work without multipath, we tolerate repeated
I/O retries during error recovery or controller reset.

> 
> Can you try this:
> -- 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index d58231636d11..96c0d664fe9b 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1149,6 +1149,11 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
>          return;
> 
>   requeue:
> +       /*
> +        * make sure queues are not quiesced due to a reconnect
> +        * sequence that failed after creating some I/O queues
> +        */
> +       nvme_start_queues(ctrl);
>          dev_info(ctrl->ctrl.device, "Failed reconnect attempt %d\n",
>                          ctrl->ctrl.nr_reconnects);
>          nvme_rdma_reconnect_or_remove(ctrl);
> -- 
> .Surely this can solve the problem, but when controller reset in
multipath environment we expect fail over to other path and retry io,
just like error recovery. Both for controller reset and error recovery,
we need unquiesce the queue. So nvme_rdma_teardown_io_queues should
directly unquiesce queues after cancel request, do not need care
the parameter:remove.
For this solution, if nvme_rdma_setup_ctrl failed in nvme_rdma_reset_ctrl_work,
will start nvme_rdma_reconnect_ctrl_work, thus no difference.
If we do not tolerate repeated I/O retries during controller reset,
we can do like this:
---
  drivers/nvme/host/rdma.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index b381e2cde50a..ac79c4d294c2 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1936,6 +1936,7 @@ nvme_rdma_timeout(struct request *rq, bool reserved)
  		 */
  		flush_work(&ctrl->err_work);
  		nvme_rdma_teardown_io_queues(ctrl, false);
+		nvme_start_queues(&ctrl->ctrl);
  		nvme_rdma_teardown_admin_queue(ctrl, false);
  		return BLK_EH_DONE;
  	}
-- 
2.16.4