nvme-tcp host potential bugs

Sun Dec 12 23:36:55 PST 2021

Hi Sagi,

Regarding the first issue, you are right.
we missed that ctrl->queues array allocation is capped by opts->nr_io_queues,

However, we do believe that the second issue does exist:
As part of create_ctrl
queue_count == nr_io_queues + 1 == 49 (in our example 48 = num_online_cpus())

As part of nvme_tcp_setup_ctrl,
alloc_io_queues calls set_queue_count where nr_io_queues is set to 8 (in our case, target ctrl supports up to 8 io queues)
ctrl->queue_count == 8 + 1

Timeout occurs and reconnect:
After reconnect, target ctrl is able to support up to 128 queues.
alloc_io_queues set nr_io_queues to 48  and ctrl->queue_count == 49

after 8 iterations, start_io_queues  calls nvme_tcp_start_queue with qid 9
nvmf_connect_io_queue is being called with qid 9
__nvme_submit_sync_cmd is calling the block layer (blk_mq_alloc_request_hctx) with qid 9 which returns with an error

Thanks,
Amit

-----Original Message-----
From: Sagi Grimberg <sagi at grimberg.me> 
Sent: Wednesday, December 8, 2021 1:33 PM
To: Engel, Amit; linux-nvme at lists.infradead.org
Cc: Anner, Ran
Subject: Re: nvme-tcp host potential bugs

[EXTERNAL EMAIL] 

> Hello Sagi,
> 
> We would like to share and hear your inputs regarding 2 host nvme-tcp 
> issues that we have encountered in kernel version 
> 4.18.0-348.2.1.el8_5.x86_64
> 
> First issue:
> As part of nvme_tcp_create_ctrl, ctrl->queues is being allocated per 
> ctrl in size of queue_count Ctrl queue_count is being set as part of 
> nvme_tcp_alloc_io_queues per nr_io_queues

ctrl->queues array is being allocated just once, in create_ctrl, and is
capped by opts->nr_io_queues, and should never exceed this size.

> 
> We see a potential issue in the following scenario:
> A connection is being established with x I/O queues and ctrl->queues 
> is allocated to be of size x + 1

This means that opts->nr_io_queues == x

> Assuming there is a reconnection (due to a timeout or any other 
> reason) The new connection is being established with y I/O queues 
> (where y > x)

That should not be possible. It can be that the controller refused to accept all the queues that the host asked for, which means that user wants z queues, controller accepted the first go x queues and in the second go y queues where z <= y < x (according to your example).

ctrl->queues size is z
first round the host connected x of these queues, and in the second round the host connected y of the queues.

> 
> In this case, ctrl->queues was previously allocated with queue_count x 
> + 1 But now queue_count is being updated to y + 1 As part of 
> nvme_tcp_alloc_queue, we have struct nvme_tcp_queue *queue = 
> &ctrl->queues[qid]; which might lead to access an out of range memory 
> location (When qid > x + 1]) again, ctrl->queues was allocated with 
> queue_count == x + 1 and not y
> 
> To prove the above theory, we added some debug prints when using x == 8 and y == 48:
> 
> #creating 8 I/O queues, queue_count == 9, queues points to 
> 00000000fd0a0f0f

Yes, but what is the array size?

> Nov 30 14:02:25 nc9127122.drm.lab.emc.com kernel: nvme nvme15: 
> creating 8 I/O queues. queues 00000000fd0a0f0f queue_count 9 Nov 30 14:02:25 nc9127122.drm.lab.emc.com kernel: nvme nvme15: mapped 8/0/0 default/read/poll queues.
> Nov 30 14:02:25 nc9127122.drm.lab.emc.com kernel: nvme nvme15: 
> Successfully reconnected (1 attempt)
> 
> #Timeout occurs that leads to reconnecting:
> Nov 30 14:02:42 nc9127122.drm.lab.emc.com kernel: nvme nvme15: queue 
> 0: timeout request 0x0 type 4 Nov 30 14:02:42 
> nc9127122.drm.lab.emc.com kernel: nvme nvme15: starting error recovery 
> Nov 30 14:02:42 nc9127122.drm.lab.emc.com kernel: nvme nvme15: failed nvme_keep_alive_end_io error=10 Nov 30 14:02:42 nc9127122.drm.lab.emc.com kernel: nvme nvme15: Reconnecting in 10 seconds...
> 
> #Creating 48 I/O queues, queue_count == 49, queues points again to 
> 00000000fd0a0f0f

again, what is the array size?

> Nov 30 14:02:52 nc9127122.drm.lab.emc.com kernel: nvme nvme15: 
> creating 48 I/O queues. queues 00000000fd0a0f0f queue_count 49
> 
> Second issue:
> With the same above example , where x <  y As part of the reconnection 
> process, nvme_tcp_configure_io_queues is being called In this 
> function, nvme_tcp_start_io_queues is being called with the new (y) 
> queue_count Which will lead to an error (when sending the IO connect 
> command to the block layer)

But the host won't connect x queues, it will only connect y queues.
Maybe I'm missing something?

> 
> Thanks,
> Amit
> 
> 
> Internal Use - Confidential

I'm assuming that this is not confidential as you are posting this to Linux-nvme, so please drop this notice from your upstream mails.