[PATCH v2 3/3] nvme-rdma: Handle number of queue changes

Sun Aug 28 05:20:52 PDT 2022

>> On Fri, Aug 26, 2022 at 03:31:15PM +0800, Chao Leng wrote:
>>>> After seeing both version I tend to do say the first one keeps the
>>>> 'wierd' stuff more closer together and doesn't make the callside of
>>>> nvme_rdma_start_io_queues() do the counting. So my personal preference
>>> I don't understand "do the counting".
>>
>> Sorry. I meant we fist start only queues for which have resources
>> allocated (nr_queues in my patch). And then we only need to start
>> potentially added queues.
>>
>>> Show the code:
>>> ---
>>>   drivers/nvme/host/rdma.c | 9 ++++-----
>>>   1 file changed, 4 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>> index 7d01fb770284..8dfb79726e13 100644
>>> --- a/drivers/nvme/host/rdma.c
>>> +++ b/drivers/nvme/host/rdma.c
>>> @@ -980,10 +980,6 @@ static int nvme_rdma_configure_io_queues(struct 
>>> nvme_rdma_ctrl *ctrl, bool new)
>>>                          goto out_free_tag_set;
>>>          }
>>>
>>> -       ret = nvme_rdma_start_io_queues(ctrl);
>>> -       if (ret)
>>> -               goto out_cleanup_connect_q;
>>
>> Again, these need to start so that...
>>
>>> -
>>>          if (!new) {
>>>                  nvme_start_queues(&ctrl->ctrl);
>>>                  if (!nvme_wait_freeze_timeout(&ctrl->ctrl,
>>
>> ... this here has a chance to work.
> Some request will be submited, and will failed, and then retry
> or failover. It is similar to nvme_cancel_tagset in 
> nvme_rdma_teardown_io_queues.
> I think it is acceptable.

Not really...

In order for the queue freeze to complete, all pending IOs need
to complete or error out, and that cannot be guaranteed without
restarting the queues as some may be waiting on tags and need to
be restarted in order to complete.

See 9f98772ba307 ("nvme-rdma: fix controller reset hang during traffic")