[bug report] nvme/rdma: nvme connect failed after offline one cpu on host side

Wed Jul 6 08:30:43 PDT 2022

>>> update the subject to better describe the issue:
>>>
>>> So I tried this issue on one nvme/rdma environment, and it was also
>>> reproducible, here are the steps:
>>>
>>> # echo 0 >/sys/devices/system/cpu/cpu0/online
>>> # dmesg | tail -10
>>> [  781.577235] smpboot: CPU 0 is now offline
>>> # nvme connect -t rdma -a 172.31.45.202 -s 4420 -n testnqn
>>> Failed to write to /dev/nvme-fabrics: Invalid cross-device link
>>> no controller found: failed to write to nvme-fabrics device
>>>
>>> # dmesg
>>> [  781.577235] smpboot: CPU 0 is now offline
>>> [  799.471627] nvme nvme0: creating 39 I/O queues.
>>> [  801.053782] nvme nvme0: mapped 39/0/0 default/read/poll queues.
>>> [  801.064149] nvme nvme0: Connect command failed, error wo/DNR bit: -16402
>>> [  801.073059] nvme nvme0: failed to connect queue: 1 ret=-18
>>
>> This is because of blk_mq_alloc_request_hctx() and was raised before.
>>
>> IIRC there was reluctance to make it allocate a request for an hctx even
>> if its associated mapped cpu is offline.
>>
>> The latest attempt was from Ming:
>> [PATCH V7 0/3] blk-mq: fix blk_mq_alloc_request_hctx
>>
>> Don't know where that went tho...
> 
> The attempt relies on that the queue for connecting io queue uses
> non-admined irq, unfortunately that can't be true for all drivers,
> so that way can't go.

The only consumer is nvme-fabrics, so others don't matter.
Maybe we need a different interface that allows this relaxation.

> So far, I'd suggest to fix nvme_*_connect_io_queues() to ignore failed
> io queue, then the nvme host still can be setup with less io queues.

What happens when the CPU comes back? Not sure we can simply ignore it.

> Otherwise nvme_*_connect_io_queues() could fail easily, especially for
> 1:1 mapping.