[RFC] nvme-tcp: Handle number of queue changes
Sagi Grimberg
sagi at grimberg.me
Wed Aug 17 04:28:12 PDT 2022
> On reconnect, the number of queues might have changed. In the case
> where the old number of queues is lower than the new one, we try to
> restart the new number of queues. At this point though, the tagset
> is has not yet been updated and the operations fails.
>
> Thus, on reconnect only start only the old number of queues, then update
> the tag set and then start the rest of the newly added queues.
I agree this change is needed. The same issue exists in rdma and
possibly others...
>
> Signed-off-by: Daniel Wagner <dwagner at suse.de>
> ---
>
> We got a report from our customer that a firmware upgrade on the
> storage array is able to 'break' host. This is caused a change of
> number of queues which the target supports after a reconnect.
>
> Let's assume the number of queues is 8 and all is working fine. Then
> the connection is dropped and the host starts to try to
> reconnect. Eventually, this succeeds but now the new number of queues
> is 10:
>
> nvme0: creating 8 I/O queues.
> nvme0: mapped 8/0/0 default/read/poll queues.
> nvme0: new ctrl: NQN "nvmet-test", addr 10.100.128.29:4420
> nvme0: queue 0: timeout request 0x0 type 4
> nvme0: starting error recovery
> nvme0: failed nvme_keep_alive_end_io error=10
> nvme0: Reconnecting in 10 seconds...
> nvme0: failed to connect socket: -110
> nvme0: Failed reconnect attempt 1
> nvme0: Reconnecting in 10 seconds...
> nvme0: creating 10 I/O queues.
> nvme0: Connect command failed, error wo/DNR bit: -16389
> nvme0: failed to connect queue: 9 ret=-5
> nvme0: Failed reconnect attempt 2
>
> As you can see queue number 9 is not able to connect.
>
> As the order of starting and unfreezing is important we can't just
> move the start of the queues after the tagset update. So my stupid
> idea was to start just the older number of queues and then the rest.
>
> This seems work:
>
> nvme nvme0: creating 4 I/O queues.
> nvme nvme0: mapped 4/0/0 default/read/poll queues.
> nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 5 qcnt 5
> nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 5 qcnt 5
> nvme nvme0: new ctrl: NQN "nvmet-test", addr 10.100.128.29:4420
> nvme nvme0: queue 0: timeout request 0x0 type 4
> nvme nvme0: starting error recovery
> nvme0: Keep Alive(0x18), Unknown (sct 0x3 / sc 0x71)
> nvme nvme0: failed nvme_keep_alive_end_io error=10
> nvme nvme0: Reconnecting in 10 seconds...
> nvme nvme0: creating 6 I/O queues.
> nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 7 qcnt 5
> nvme nvme0: mapped 6/0/0 default/read/poll queues.
> nvme_tcp_start_io_queues nr_hw_queues 6 queue_count 7 qcnt 7
> nvme nvme0: Successfully reconnected (1 attempt)
> nvme nvme0: starting error recovery
> nvme0: Keep Alive(0x18), Unknown (sct 0x3 / sc 0x71)
> nvme nvme0: failed nvme_keep_alive_end_io error=10
> nvme nvme0: Reconnecting in 10 seconds...
> nvme nvme0: creating 4 I/O queues.
> nvme_tcp_start_io_queues nr_hw_queues 6 queue_count 5 qcnt 5
> nvme nvme0: mapped 4/0/0 default/read/poll queues.
> nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 5 qcnt 5
> nvme nvme0: Successfully reconnected (1 attempt)
>
> And yes, the patch per se is ugly, but I didn't want to reorganize
> code at this point. It just an illustration of my idea.
>
> Feedback highly appreciated!
>
> Thanks,
> Daniel
>
>
> drivers/nvme/host/tcp.c | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index e82dcfcda29b..177d2f5d9047 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1671,6 +1671,9 @@ static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
> struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
> int ret;
>
> + if (test_bit(NVME_TCP_Q_LIVE, &ctrl->queues[idx].flags))
> + return 0;
> +
How about we make concise calls to queue starts instead of
making this re-entrant safe... Although this does follow
the same model as pci...
> if (idx)
> ret = nvmf_connect_io_queue(nctrl, idx);
> else
> @@ -1759,12 +1762,18 @@ static void nvme_tcp_stop_io_queues(struct nvme_ctrl *ctrl)
> nvme_tcp_stop_queue(ctrl, i);
> }
>
> -static int nvme_tcp_start_io_queues(struct nvme_ctrl *ctrl)
> +static int nvme_tcp_start_io_queues(struct nvme_ctrl *nctrl)
> {
> - int i, ret;
> + struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
> + int i, ret, qcnt;
>
> - for (i = 1; i < ctrl->queue_count; i++) {
> - ret = nvme_tcp_start_queue(ctrl, i);
> + qcnt = min(ctrl->tag_set.nr_hw_queues + 1, nctrl->queue_count);
> +
> + dev_info(nctrl->device, "nr_hw_queues %d queue_count %d qcnt %d\n",
> + ctrl->tag_set.nr_hw_queues, nctrl->queue_count, qcnt);
No need for this print as info, its just spamming...
> +
> + for (i = 1; i < qcnt; i++) {
> + ret = nvme_tcp_start_queue(nctrl, i);
> if (ret)
> goto out_stop_queues;
> }
> @@ -1773,7 +1782,7 @@ static int nvme_tcp_start_io_queues(struct nvme_ctrl *ctrl)
>
> out_stop_queues:
> for (i--; i >= 1; i--)
> - nvme_tcp_stop_queue(ctrl, i);
> + nvme_tcp_stop_queue(nctrl, i);
> return ret;
> }
>
> @@ -1934,6 +1943,10 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
> nvme_unfreeze(ctrl);
> }
>
> + ret = nvme_tcp_start_io_queues(ctrl);
> + if (ret)
> + goto out_cleanup_connect_q;
> +
Shouldn't we do the same for the create itself? Is there any chance that
we will leave a tcp connection idle in any situation?
More information about the Linux-nvme
mailing list