[bug report] NVMe/IB: reset_controller need more than 1min

Max Gurtovoy mgurtovoy at nvidia.com
Wed Feb 23 03:20:03 PST 2022


On 2/23/2022 12:30 PM, Sagi Grimberg wrote:
>
>> Hi Yi Zhang,
>>
>> thanks for testing the patches.
>>
>> Can you provide more info on the time it took with both kernels ?
>>
>> The patches don't intend to decrease this time but re-start the KA in 
>> early stage - as soon as we create the AQ.
>
> Still not sure why this addresses the problem, because every io queue
> connect should reset the keep alive timer in the target.

Right, in the NVMf connect. Not in transport connect.

You first allocate all IO queues (takes time) and only then nvmf_connect 
all IO queues.

In this time you probably get the timeout I guess.

between nvme_rdma_alloc_io_queues and nvme_rdma_start_io_queues there is 
no reason the admin_q can't send keep-alives.

This is what I tried pushing upstream few years ago...

Anyway, we shouldn't assume anything about the target implementation. We 
need to do our best to have a working initiator logic.

>
> But if at all, just move the keep alive start to nvme_init_ctrl_finish
> don't expose it to drivers...

Yes, back then there was no nvme_init_ctrl_finish code.





More information about the Linux-nvme mailing list