nvme-tcp host potential bugs

Wed Dec 8 03:33:09 PST 2021

> Hello Sagi,
> 
> We would like to share and hear your inputs regarding 2 host nvme-tcp issues that we have encountered in kernel version 4.18.0-348.2.1.el8_5.x86_64
> 
> First issue:
> As part of nvme_tcp_create_ctrl, ctrl->queues is being allocated per ctrl in size of queue_count
> Ctrl queue_count is being set as part of nvme_tcp_alloc_io_queues per nr_io_queues

ctrl->queues array is being allocated just once, in create_ctrl, and is
capped by opts->nr_io_queues, and should never exceed this size.

> 
> We see a potential issue in the following scenario:
> A connection is being established with x I/O queues and ctrl->queues is allocated to be of size x + 1

This means that opts->nr_io_queues == x

> Assuming there is a reconnection (due to a timeout or any other reason)
> The new connection is being established with y I/O queues (where y > x)

That should not be possible. It can be that the controller refused to
accept all the queues that the host asked for, which means that
user wants z queues, controller accepted the first go x queues and in
the second go y queues where z <= y < x (according to your example).

ctrl->queues size is z
first round the host connected x of these queues, and in the second
round the host connected y of the queues.

> 
> In this case, ctrl->queues was previously allocated with queue_count x + 1
> But now queue_count is being updated to y + 1
> As part of nvme_tcp_alloc_queue, we have
> struct nvme_tcp_queue *queue = &ctrl->queues[qid];
> which might lead to access an out of range memory location (When qid > x + 1])
> again, ctrl->queues was allocated with queue_count == x + 1 and not y
> 
> To prove the above theory, we added some debug prints when using x == 8 and y == 48:
> 
> #creating 8 I/O queues, queue_count == 9, queues points to 00000000fd0a0f0f

Yes, but what is the array size?

> Nov 30 14:02:25 nc9127122.drm.lab.emc.com kernel: nvme nvme15: creating 8 I/O queues. queues 00000000fd0a0f0f queue_count 9
> Nov 30 14:02:25 nc9127122.drm.lab.emc.com kernel: nvme nvme15: mapped 8/0/0 default/read/poll queues.
> Nov 30 14:02:25 nc9127122.drm.lab.emc.com kernel: nvme nvme15: Successfully reconnected (1 attempt)
> 
> #Timeout occurs that leads to reconnecting:
> Nov 30 14:02:42 nc9127122.drm.lab.emc.com kernel: nvme nvme15: queue 0: timeout request 0x0 type 4
> Nov 30 14:02:42 nc9127122.drm.lab.emc.com kernel: nvme nvme15: starting error recovery
> Nov 30 14:02:42 nc9127122.drm.lab.emc.com kernel: nvme nvme15: failed nvme_keep_alive_end_io error=10
> Nov 30 14:02:42 nc9127122.drm.lab.emc.com kernel: nvme nvme15: Reconnecting in 10 seconds...
> 
> #Creating 48 I/O queues, queue_count == 49, queues points again to 00000000fd0a0f0f

again, what is the array size?

> Nov 30 14:02:52 nc9127122.drm.lab.emc.com kernel: nvme nvme15: creating 48 I/O queues. queues 00000000fd0a0f0f queue_count 49
> 
> Second issue:
> With the same above example , where x <  y
> As part of the reconnection process, nvme_tcp_configure_io_queues is being called
> In this function, nvme_tcp_start_io_queues is being called with the new (y) queue_count
> Which will lead to an error (when sending the IO connect command to the block layer)

But the host won't connect x queues, it will only connect y queues.
Maybe I'm missing something?

> 
> Thanks,
> Amit
> 
> 
> Internal Use - Confidential

I'm assuming that this is not confidential as you are posting this
to Linux-nvme, so please drop this notice from your upstream mails.