nvme-tcp host potential bugs

Mon Dec 13 01:08:58 PST 2021

> Hi Sagi,
> 
> Regarding the first issue, you are right.
> we missed that ctrl->queues array allocation is capped by opts->nr_io_queues,
> 
> However, we do believe that the second issue does exist:
> As part of create_ctrl
> queue_count == nr_io_queues + 1 == 49 (in our example 48 = num_online_cpus())
> 
> As part of nvme_tcp_setup_ctrl,
> alloc_io_queues calls set_queue_count where nr_io_queues is set to 8 (in our case, target ctrl supports up to 8 io queues)
> ctrl->queue_count == 8 + 1
> 
> Timeout occurs and reconnect:
> After reconnect, target ctrl is able to support up to 128 queues.
> alloc_io_queues set nr_io_queues to 48  and ctrl->queue_count == 49
> 
> after 8 iterations, start_io_queues  calls nvme_tcp_start_queue with qid 9
> nvmf_connect_io_queue is being called with qid 9
> __nvme_submit_sync_cmd is calling the block layer (blk_mq_alloc_request_hctx) with qid 9 which returns with an error

Yes, blk_mq_update_nr_hw_queues used to be before nvme_start_queues and
that changed in commit 2875b0aecabe2 which addressed a reset hang in
case the queue map changes across resets (as the queue is frozen
inside).

What I think we may need to do, is first start the older queues (or
less if the new count is smaller) and if the new count is bigger,
need to later, after we connect, update the nr_hw_queues and then
start the new set of queues (ouch).