nvme-tcp host potential bugs
Engel, Amit
Amit.Engel at Dell.com
Sun Dec 12 23:36:55 PST 2021
Hi Sagi,
Regarding the first issue, you are right.
we missed that ctrl->queues array allocation is capped by opts->nr_io_queues,
However, we do believe that the second issue does exist:
As part of create_ctrl
queue_count == nr_io_queues + 1 == 49 (in our example 48 = num_online_cpus())
As part of nvme_tcp_setup_ctrl,
alloc_io_queues calls set_queue_count where nr_io_queues is set to 8 (in our case, target ctrl supports up to 8 io queues)
ctrl->queue_count == 8 + 1
Timeout occurs and reconnect:
After reconnect, target ctrl is able to support up to 128 queues.
alloc_io_queues set nr_io_queues to 48 and ctrl->queue_count == 49
after 8 iterations, start_io_queues calls nvme_tcp_start_queue with qid 9
nvmf_connect_io_queue is being called with qid 9
__nvme_submit_sync_cmd is calling the block layer (blk_mq_alloc_request_hctx) with qid 9 which returns with an error
Thanks,
Amit
-----Original Message-----
From: Sagi Grimberg <sagi at grimberg.me>
Sent: Wednesday, December 8, 2021 1:33 PM
To: Engel, Amit; linux-nvme at lists.infradead.org
Cc: Anner, Ran
Subject: Re: nvme-tcp host potential bugs
[EXTERNAL EMAIL]
> Hello Sagi,
>
> We would like to share and hear your inputs regarding 2 host nvme-tcp
> issues that we have encountered in kernel version
> 4.18.0-348.2.1.el8_5.x86_64
>
> First issue:
> As part of nvme_tcp_create_ctrl, ctrl->queues is being allocated per
> ctrl in size of queue_count Ctrl queue_count is being set as part of
> nvme_tcp_alloc_io_queues per nr_io_queues
ctrl->queues array is being allocated just once, in create_ctrl, and is
capped by opts->nr_io_queues, and should never exceed this size.
>
> We see a potential issue in the following scenario:
> A connection is being established with x I/O queues and ctrl->queues
> is allocated to be of size x + 1
This means that opts->nr_io_queues == x
> Assuming there is a reconnection (due to a timeout or any other
> reason) The new connection is being established with y I/O queues
> (where y > x)
That should not be possible. It can be that the controller refused to accept all the queues that the host asked for, which means that user wants z queues, controller accepted the first go x queues and in the second go y queues where z <= y < x (according to your example).
ctrl->queues size is z
first round the host connected x of these queues, and in the second round the host connected y of the queues.
>
> In this case, ctrl->queues was previously allocated with queue_count x
> + 1 But now queue_count is being updated to y + 1 As part of
> nvme_tcp_alloc_queue, we have struct nvme_tcp_queue *queue =
> &ctrl->queues[qid]; which might lead to access an out of range memory
> location (When qid > x + 1]) again, ctrl->queues was allocated with
> queue_count == x + 1 and not y
>
> To prove the above theory, we added some debug prints when using x == 8 and y == 48:
>
> #creating 8 I/O queues, queue_count == 9, queues points to
> 00000000fd0a0f0f
Yes, but what is the array size?
> Nov 30 14:02:25 nc9127122.drm.lab.emc.com kernel: nvme nvme15:
> creating 8 I/O queues. queues 00000000fd0a0f0f queue_count 9 Nov 30 14:02:25 nc9127122.drm.lab.emc.com kernel: nvme nvme15: mapped 8/0/0 default/read/poll queues.
> Nov 30 14:02:25 nc9127122.drm.lab.emc.com kernel: nvme nvme15:
> Successfully reconnected (1 attempt)
>
> #Timeout occurs that leads to reconnecting:
> Nov 30 14:02:42 nc9127122.drm.lab.emc.com kernel: nvme nvme15: queue
> 0: timeout request 0x0 type 4 Nov 30 14:02:42
> nc9127122.drm.lab.emc.com kernel: nvme nvme15: starting error recovery
> Nov 30 14:02:42 nc9127122.drm.lab.emc.com kernel: nvme nvme15: failed nvme_keep_alive_end_io error=10 Nov 30 14:02:42 nc9127122.drm.lab.emc.com kernel: nvme nvme15: Reconnecting in 10 seconds...
>
> #Creating 48 I/O queues, queue_count == 49, queues points again to
> 00000000fd0a0f0f
again, what is the array size?
> Nov 30 14:02:52 nc9127122.drm.lab.emc.com kernel: nvme nvme15:
> creating 48 I/O queues. queues 00000000fd0a0f0f queue_count 49
>
> Second issue:
> With the same above example , where x < y As part of the reconnection
> process, nvme_tcp_configure_io_queues is being called In this
> function, nvme_tcp_start_io_queues is being called with the new (y)
> queue_count Which will lead to an error (when sending the IO connect
> command to the block layer)
But the host won't connect x queues, it will only connect y queues.
Maybe I'm missing something?
>
> Thanks,
> Amit
>
>
> Internal Use - Confidential
I'm assuming that this is not confidential as you are posting this to Linux-nvme, so please drop this notice from your upstream mails.
More information about the Linux-nvme
mailing list