[bug report] NVMe/IB: reset_controller need more than 1min

Tue Feb 15 06:30:55 PST 2022

Thanks Yi Zhang.

Few years ago I've sent some patches that were supposed to fix the KA 
mechanism but eventually they weren't accepted.

I haven't tested it since but maybe you can run some tests with it.

The attached patches are partial and include only rdma transport for 
your testing.

If it work for you we can work on it again and argue for correctness.

Please don't use the workaround we suggested earlier with these patches.

-Max.

On 2/15/2022 3:52 PM, Yi Zhang wrote:
> Hi Sagi/Max
>
> Changing the value to 10 or 15 fixed the timeout issue.
> And the reset operation still needs more than 12s on my environment, I
> also tried disabling the pi_enable, the reset operation will be back
> to 3s, so seems the added 9s was due to the PI enabled code path.
>
> On Mon, Feb 14, 2022 at 8:12 PM Max Gurtovoy <mgurtovoy at nvidia.com> wrote:
>>
>> On 2/14/2022 1:32 PM, Sagi Grimberg wrote:
>>>> Hi Sagi/Max
>>>> Here are more findings with the bisect:
>>>>
>>>> The time for reset operation changed from 3s[1] to 12s[2] after
>>>> commit[3], and after commit[4], the reset operation timeout at the
>>>> second reset[5], let me know if you need any testing for it, thanks.
>>> Does this at least eliminate the timeout?
>>> --
>>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
>>> index a162f6c6da6e..60e415078893 100644
>>> --- a/drivers/nvme/host/nvme.h
>>> +++ b/drivers/nvme/host/nvme.h
>>> @@ -25,7 +25,7 @@ extern unsigned int nvme_io_timeout;
>>>   extern unsigned int admin_timeout;
>>>   #define NVME_ADMIN_TIMEOUT     (admin_timeout * HZ)
>>>
>>> -#define NVME_DEFAULT_KATO      5
>>> +#define NVME_DEFAULT_KATO      10
>>>
>>>   #ifdef CONFIG_ARCH_NO_SG_CHAIN
>>>   #define  NVME_INLINE_SG_CNT  0
>>> --
>>>
>> or for the initial test you can use --keep-alive-tmo=<10 or 15> flag in
>> the connect command
>>
>>>> [1]
>>>> # time nvme reset /dev/nvme0
>>>>
>>>> real 0m3.049s
>>>> user 0m0.000s
>>>> sys 0m0.006s
>>>> [2]
>>>> # time nvme reset /dev/nvme0
>>>>
>>>> real 0m12.498s
>>>> user 0m0.000s
>>>> sys 0m0.006s
>>>> [3]
>>>> commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc (HEAD)
>>>> Author: Max Gurtovoy <maxg at mellanox.com>
>>>> Date:   Tue May 19 17:05:56 2020 +0300
>>>>
>>>>       nvme-rdma: add metadata/T10-PI support
>>>>
>>>> [4]
>>>> commit a70b81bd4d9d2d6c05cfe6ef2a10bccc2e04357a (HEAD)
>>>> Author: Hannes Reinecke <hare at suse.de>
>>>> Date:   Fri Apr 16 13:46:20 2021 +0200
>>>>
>>>>       nvme: sanitize KATO setting-
>>> This change effectively changed the keep-alive timeout
>>> from 15 to 5 and modified the host to send keepalives every
>>> 2.5 seconds instead of 5.
>>>
>>> I guess that in combination that now it takes longer to
>>> create and delete rdma resources (either qps or mrs)
>>> it starts to timeout in setups where there are a lot of
>>> queues.
>
-------------- next part --------------
From 4f5629786ceaa6912d5c585c18cc5d83fa346ea9 Mon Sep 17 00:00:00 2001
From: Max Gurtovoy <mgurtovoy at nvidia.com>
Date: Tue, 10 Apr 2018 15:21:29 +0000
Subject: [PATCH 1/3] Revert "nvme: unexport nvme_start_keep_alive"

This reverts commit c8799eee39e7523e5e0be10f8950b11cb66085bd.

nvme_start_keep_alive() will be used by the transport drivers
to fix keep-alive synchronization between NVMe-oF target/host.

Signed-off-by: Max Gurtovoy <mgurtovoy at nvidia.com>
---
 drivers/nvme/host/core.c | 3 ++-
 drivers/nvme/host/nvme.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 961a5f8a44d2..46f2d85d4f31 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1284,13 +1284,14 @@ static void nvme_keep_alive_work(struct work_struct *work)
 	blk_execute_rq_nowait(rq, false, nvme_keep_alive_end_io);
 }
 
-static void nvme_start_keep_alive(struct nvme_ctrl *ctrl)
+void nvme_start_keep_alive(struct nvme_ctrl *ctrl)
 {
 	if (unlikely(ctrl->kato == 0))
 		return;
 
 	nvme_queue_keep_alive_work(ctrl);
 }
+EXPORT_SYMBOL_GPL(nvme_start_keep_alive);
 
 void nvme_stop_keep_alive(struct nvme_ctrl *ctrl)
 {
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a162f6c6da6e..1610ec764bfc 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -729,6 +729,7 @@ int nvme_get_features(struct nvme_ctrl *dev, unsigned int fid,
 		      unsigned int dword11, void *buffer, size_t buflen,
 		      u32 *result);
 int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
+void nvme_start_keep_alive(struct nvme_ctrl *ctrl);
 void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
 int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
 int nvme_reset_ctrl_sync(struct nvme_ctrl *ctrl);
-- 
2.18.1

-------------- next part --------------
From cbd2e21bc4487e345d37aa643404eab495c05e25 Mon Sep 17 00:00:00 2001
From: Max Gurtovoy <mgurtovoy at nvidia.com>
Date: Tue, 10 Apr 2018 16:30:18 +0000
Subject: [PATCH 2/3] nvme: remove association between ctrl and keep-alive

Keep-alive mechanism is an admin queue property and
should be activated/deactivated during admin queue
creation/destruction.

Signed-off-by: Max Gurtovoy <mgurtovoy at nvidia.com>
---
 drivers/nvme/host/core.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 46f2d85d4f31..34b006a77270 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4389,7 +4389,6 @@ EXPORT_SYMBOL_GPL(nvme_complete_async_event);
 void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
 {
 	nvme_mpath_stop(ctrl);
-	nvme_stop_keep_alive(ctrl);
 	nvme_stop_failfast_work(ctrl);
 	flush_work(&ctrl->async_event_work);
 	cancel_work_sync(&ctrl->fw_act_work);
@@ -4398,8 +4397,6 @@ EXPORT_SYMBOL_GPL(nvme_stop_ctrl);
 
 void nvme_start_ctrl(struct nvme_ctrl *ctrl)
 {
-	nvme_start_keep_alive(ctrl);
-
 	nvme_enable_aen(ctrl);
 
 	if (ctrl->queue_count > 1) {
-- 
2.18.1

-------------- next part --------------
From 3a40795e1c47aae8297a1dcaa821f63fea5e4baa Mon Sep 17 00:00:00 2001
From: Max Gurtovoy <mgurtovoy at nvidia.com>
Date: Tue, 10 Apr 2018 16:33:06 +0000
Subject: [PATCH 3/3] nvme-rdma: add keep-alive mechanism as admin_q property

Activate/deactivate it during admin queue creation/destruction
and remove association to nvme ctrl.

Signed-off-by: Max Gurtovoy <mgurtovoy at nvidia.com>
---
 drivers/nvme/host/rdma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 993e3a076a41..ab71f7d3d6a5 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -925,6 +925,8 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 	if (error)
 		goto out_quiesce_queue;
 
+	nvme_start_keep_alive(&ctrl->ctrl);
+
 	return 0;
 
 out_quiesce_queue:
@@ -1026,6 +1028,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 static void nvme_rdma_teardown_admin_queue(struct nvme_rdma_ctrl *ctrl,
 		bool remove)
 {
+	nvme_stop_keep_alive(&ctrl->ctrl);
 	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 	nvme_rdma_stop_queue(&ctrl->queues[0]);
@@ -1199,7 +1202,6 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 	struct nvme_rdma_ctrl *ctrl = container_of(work,
 			struct nvme_rdma_ctrl, err_work);
 
-	nvme_stop_keep_alive(&ctrl->ctrl);
 	flush_work(&ctrl->ctrl.async_event_work);
 	nvme_rdma_teardown_io_queues(ctrl, false);
 	nvme_start_queues(&ctrl->ctrl);
-- 
2.18.1