[bug report] NVMe/IB: reset_controller need more than 1min
Sagi Grimberg
sagi at grimberg.me
Mon Dec 13 01:04:59 PST 2021
>>>>>> Hello
>>>>>>
>>>>>> Gentle ping here, this issue still exists on latest 5.13-rc7
>>>>>>
>>>>>> # time nvme reset /dev/nvme0
>>>>>>
>>>>>> real 0m12.636s
>>>>>> user 0m0.002s
>>>>>> sys 0m0.005s
>>>>>> # time nvme reset /dev/nvme0
>>>>>>
>>>>>> real 0m12.641s
>>>>>> user 0m0.000s
>>>>>> sys 0m0.007s
>>>>>
>>>>> Strange that even normal resets take so long...
>>>>> What device are you using?
>>>>
>>>> Hi Sagi
>>>>
>>>> Here is the device info:
>>>> Mellanox Technologies MT27700 Family [ConnectX-4]
>>>>
>>>>>
>>>>>> # time nvme reset /dev/nvme0
>>>>>>
>>>>>> real 1m16.133s
>>>>>> user 0m0.000s
>>>>>> sys 0m0.007s
>>>>>
>>>>> There seems to be a spurious command timeout here, but maybe this
>>>>> is due to the fact that the queues take so long to connect and
>>>>> the target expires the keep-alive timer.
>>>>>
>>>>> Does this patch help?
>>>>
>>>> The issue still exists, let me know if you need more testing for it. :)
>>>
>>> Hi Sagi
>>> ping, this issue still can be reproduced on the latest
>>> linux-block/for-next, do you have a chance to recheck it, thanks.
>>
>> Can you check if it happens with the below patch:
>
> Hi Sagi
> It is still reproducible with the change, here is the log:
>
> # time nvme reset /dev/nvme0
>
> real 0m12.973s
> user 0m0.000s
> sys 0m0.006s
> # time nvme reset /dev/nvme0
>
> real 1m15.606s
> user 0m0.000s
> sys 0m0.007s
Does it speed up if you use less queues? (i.e. connect with -i 4) ?
>
> # dmesg | grep nvme
> [ 900.634877] nvme nvme0: resetting controller
> [ 909.026958] nvme nvme0: creating 40 I/O queues.
> [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues.
> [ 917.600993] nvme nvme0: resetting controller
> [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout
> [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14
> [ 988.608181] nvme nvme0: creating 40 I/O queues.
> [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues.
>
> BTW, this issue cannot be reproduced on my NVME/ROCE environment.
Then I think that we need the rdma folks to help here...
More information about the Linux-nvme
mailing list