[RFC] nvme-tcp: Handle number of queue changes

Wed Aug 17 04:28:12 PDT 2022

> On reconnect, the number of queues might have changed. In the case
> where the old number of queues is lower than the new one, we try to
> restart the new number of queues. At this point though, the tagset
> is has not yet been updated and the operations fails.
> 
> Thus, on reconnect only start only the old number of queues, then update
> the tag set and then start the rest of the newly added queues.

I agree this change is needed. The same issue exists in rdma and
possibly others...

> 
> Signed-off-by: Daniel Wagner <dwagner at suse.de>
> ---
> 
> We got a report from our customer that a firmware upgrade on the
> storage array is able to 'break' host. This is caused a change of
> number of queues which the target supports after a reconnect.
> 
> Let's assume the number of queues is 8 and all is working fine. Then
> the connection is dropped and the host starts to try to
> reconnect. Eventually, this succeeds but now the new number of queues
> is 10:
> 
> nvme0: creating 8 I/O queues.
> nvme0: mapped 8/0/0 default/read/poll queues.
> nvme0: new ctrl: NQN "nvmet-test", addr 10.100.128.29:4420
> nvme0: queue 0: timeout request 0x0 type 4
> nvme0: starting error recovery
> nvme0: failed nvme_keep_alive_end_io error=10
> nvme0: Reconnecting in 10 seconds...
> nvme0: failed to connect socket: -110
> nvme0: Failed reconnect attempt 1
> nvme0: Reconnecting in 10 seconds...
> nvme0: creating 10 I/O queues.
> nvme0: Connect command failed, error wo/DNR bit: -16389
> nvme0: failed to connect queue: 9 ret=-5
> nvme0: Failed reconnect attempt 2
> 
> As you can see queue number 9 is not able to connect.
> 
> As the order of starting and unfreezing is important we can't just
> move the start of the queues after the tagset update. So my stupid
> idea was to start just the older number of queues and then the rest.
> 
> This seems work:
> 
> nvme nvme0: creating 4 I/O queues.
> nvme nvme0: mapped 4/0/0 default/read/poll queues.
> nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 5 qcnt 5
> nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 5 qcnt 5
> nvme nvme0: new ctrl: NQN "nvmet-test", addr 10.100.128.29:4420
> nvme nvme0: queue 0: timeout request 0x0 type 4
> nvme nvme0: starting error recovery
> nvme0: Keep Alive(0x18), Unknown (sct 0x3 / sc 0x71)
> nvme nvme0: failed nvme_keep_alive_end_io error=10
> nvme nvme0: Reconnecting in 10 seconds...
> nvme nvme0: creating 6 I/O queues.
> nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 7 qcnt 5
> nvme nvme0: mapped 6/0/0 default/read/poll queues.
> nvme_tcp_start_io_queues nr_hw_queues 6 queue_count 7 qcnt 7
> nvme nvme0: Successfully reconnected (1 attempt)
> nvme nvme0: starting error recovery
> nvme0: Keep Alive(0x18), Unknown (sct 0x3 / sc 0x71)
> nvme nvme0: failed nvme_keep_alive_end_io error=10
> nvme nvme0: Reconnecting in 10 seconds...
> nvme nvme0: creating 4 I/O queues.
> nvme_tcp_start_io_queues nr_hw_queues 6 queue_count 5 qcnt 5
> nvme nvme0: mapped 4/0/0 default/read/poll queues.
> nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 5 qcnt 5
> nvme nvme0: Successfully reconnected (1 attempt)
> 
> And yes, the patch per se is ugly, but I didn't want to reorganize
> code at this point. It just an illustration of my idea.
> 
> Feedback highly appreciated!
> 
> Thanks,
> Daniel
> 
> 
>   drivers/nvme/host/tcp.c | 23 ++++++++++++++++++-----
>   1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index e82dcfcda29b..177d2f5d9047 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1671,6 +1671,9 @@ static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
>   	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
>   	int ret;
>   
> +	if (test_bit(NVME_TCP_Q_LIVE, &ctrl->queues[idx].flags))
> +		return 0;
> +

How about we make concise calls to queue starts instead of
making this re-entrant safe... Although this does follow
the same model as pci...

>   	if (idx)
>   		ret = nvmf_connect_io_queue(nctrl, idx);
>   	else
> @@ -1759,12 +1762,18 @@ static void nvme_tcp_stop_io_queues(struct nvme_ctrl *ctrl)
>   		nvme_tcp_stop_queue(ctrl, i);
>   }
>   
> -static int nvme_tcp_start_io_queues(struct nvme_ctrl *ctrl)
> +static int nvme_tcp_start_io_queues(struct nvme_ctrl *nctrl)
>   {
> -	int i, ret;
> +	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
> +	int i, ret, qcnt;
>   
> -	for (i = 1; i < ctrl->queue_count; i++) {
> -		ret = nvme_tcp_start_queue(ctrl, i);
> +	qcnt = min(ctrl->tag_set.nr_hw_queues + 1, nctrl->queue_count);
> +
> +	dev_info(nctrl->device, "nr_hw_queues %d queue_count %d qcnt %d\n",
> +	         ctrl->tag_set.nr_hw_queues, nctrl->queue_count, qcnt);

No need for this print as info, its just spamming...

> +
> +	for (i = 1; i < qcnt; i++) {
> +		ret = nvme_tcp_start_queue(nctrl, i);
>   		if (ret)
>   			goto out_stop_queues;
>   	}
> @@ -1773,7 +1782,7 @@ static int nvme_tcp_start_io_queues(struct nvme_ctrl *ctrl)
>   
>   out_stop_queues:
>   	for (i--; i >= 1; i--)
> -		nvme_tcp_stop_queue(ctrl, i);
> +		nvme_tcp_stop_queue(nctrl, i);
>   	return ret;
>   }
>   
> @@ -1934,6 +1943,10 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
>   		nvme_unfreeze(ctrl);
>   	}
>   
> +	ret = nvme_tcp_start_io_queues(ctrl);
> +	if (ret)
> +		goto out_cleanup_connect_q;
> +

Shouldn't we do the same for the create itself? Is there any chance that
we will leave a tcp connection idle in any situation?