[bug report] NVMe/IB: reset_controller need more than 1min
Sagi Grimberg
sagi at grimberg.me
Sun Dec 12 01:45:47 PST 2021
On 12/11/21 5:01 AM, Yi Zhang wrote:
> On Fri, Jun 25, 2021 at 12:14 AM Yi Zhang <yi.zhang at redhat.com> wrote:
>>
>> On Thu, Jun 24, 2021 at 5:32 AM Sagi Grimberg <sagi at grimberg.me> wrote:
>>>
>>>
>>>> Hello
>>>>
>>>> Gentle ping here, this issue still exists on latest 5.13-rc7
>>>>
>>>> # time nvme reset /dev/nvme0
>>>>
>>>> real 0m12.636s
>>>> user 0m0.002s
>>>> sys 0m0.005s
>>>> # time nvme reset /dev/nvme0
>>>>
>>>> real 0m12.641s
>>>> user 0m0.000s
>>>> sys 0m0.007s
>>>
>>> Strange that even normal resets take so long...
>>> What device are you using?
>>
>> Hi Sagi
>>
>> Here is the device info:
>> Mellanox Technologies MT27700 Family [ConnectX-4]
>>
>>>
>>>> # time nvme reset /dev/nvme0
>>>>
>>>> real 1m16.133s
>>>> user 0m0.000s
>>>> sys 0m0.007s
>>>
>>> There seems to be a spurious command timeout here, but maybe this
>>> is due to the fact that the queues take so long to connect and
>>> the target expires the keep-alive timer.
>>>
>>> Does this patch help?
>>
>> The issue still exists, let me know if you need more testing for it. :)
>
> Hi Sagi
> ping, this issue still can be reproduced on the latest
> linux-block/for-next, do you have a chance to recheck it, thanks.
Can you check if it happens with the below patch:
--
diff --git a/drivers/nvme/target/fabrics-cmd.c
b/drivers/nvme/target/fabrics-cmd.c
index f91a56180d3d..6e5aadfb07a0 100644
--- a/drivers/nvme/target/fabrics-cmd.c
+++ b/drivers/nvme/target/fabrics-cmd.c
@@ -191,6 +191,14 @@ static u16 nvmet_install_queue(struct nvmet_ctrl
*ctrl, struct nvmet_req *req)
}
}
+ /*
+ * Controller establishment flow may take some time, and the
host may not
+ * send us keep-alive during this period, hence reset the
+ * traffic based keep-alive timer so we don't trigger a
+ * controller teardown as a result of a keep-alive expiration.
+ */
+ ctrl->reset_tbkas = true;
+
return 0;
err:
--
More information about the Linux-nvme
mailing list