[PATCH] nvme-rdma: fix deadlock when delete ctrl due to reconnect fail
Sagi Grimberg
sagi at grimberg.me
Mon Jul 27 23:32:53 EDT 2020
>>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>>> index f8f856dc0c67..b381e2cde50a 100644
>>>> --- a/drivers/nvme/host/rdma.c
>>>> +++ b/drivers/nvme/host/rdma.c
>>>> @@ -989,8 +989,7 @@ static void nvme_rdma_teardown_io_queues(struct
>>>> nvme_rdma_ctrl *ctrl,
>>>> nvme_cancel_request, &ctrl->ctrl);
>>>> blk_mq_tagset_wait_completed_request(ctrl->ctrl.tagset);
>>>> }
>>>> - if (remove)
>>>> - nvme_start_queues(&ctrl->ctrl);
>>>> + nvme_start_queues(&ctrl->ctrl);
>>>
>>> This will fail I/O during controller reset, so nak on this.
> The io will do not fail. If work with native multipath or dm-multipath,
> nvme_rdma_queue_rq will return io error, and then multipath will
> fail over to other path and retry io, this is we expected. If work
> without multipath, nvme_rdma_queue_rq will return BLK_STS_RESOURCE,
> and then the upper layer will requeue and retry. Surely there is a
> weakness:the io will retry repeated every BLK_MQ_RESOURCE_DELAY(3ms)
> while reconnecting. Because controller reset may need long time,
> and nvme over roce is mainly used with multipath software, so when
> controller reset we expect fail over to other path and retry io,, just
> like error recovery. If work without multipath, we tolerate repeated
> I/O retries during error recovery or controller reset.
I/O should not fail during reset, mpath or not, period.
>> Can you try this:
>> --
>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>> index d58231636d11..96c0d664fe9b 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -1149,6 +1149,11 @@ static void
>> nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
>> return;
>>
>> requeue:
>> + /*
>> + * make sure queues are not quiesced due to a reconnect
>> + * sequence that failed after creating some I/O queues
>> + */
>> + nvme_start_queues(ctrl);
>> dev_info(ctrl->ctrl.device, "Failed reconnect attempt %d\n",
>> ctrl->ctrl.nr_reconnects);
>> nvme_rdma_reconnect_or_remove(ctrl);
>> --
>> .Surely this can solve the problem, but when controller reset in
> multipath environment we expect fail over to other path and retry io,
> just like error recovery. Both for controller reset and error recovery,
> we need unquiesce the queue. So nvme_rdma_teardown_io_queues should
> directly unquiesce queues after cancel request, do not need care
> the parameter:remove.
> For this solution, if nvme_rdma_setup_ctrl failed in
> nvme_rdma_reset_ctrl_work,
> will start nvme_rdma_reconnect_ctrl_work, thus no difference.
> If we do not tolerate repeated I/O retries during controller reset,
> we can do like this:
> ---
> drivers/nvme/host/rdma.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index b381e2cde50a..ac79c4d294c2 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1936,6 +1936,7 @@ nvme_rdma_timeout(struct request *rq, bool reserved)
> */
> flush_work(&ctrl->err_work);
> nvme_rdma_teardown_io_queues(ctrl, false);
> + nvme_start_queues(&ctrl->ctrl);
> nvme_rdma_teardown_admin_queue(ctrl, false);
> return BLK_EH_DONE;
> }
That is fine too, just do not unquiesce the queues in a normal reset.
More information about the Linux-nvme
mailing list