[bug report] NVMe/IB: reset_controller need more than 1min
Sagi Grimberg
sagi at grimberg.me
Mon Feb 14 03:32:02 PST 2022
> Hi Sagi/Max
> Here are more findings with the bisect:
>
> The time for reset operation changed from 3s[1] to 12s[2] after
> commit[3], and after commit[4], the reset operation timeout at the
> second reset[5], let me know if you need any testing for it, thanks.
Does this at least eliminate the timeout?
--
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a162f6c6da6e..60e415078893 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -25,7 +25,7 @@ extern unsigned int nvme_io_timeout;
extern unsigned int admin_timeout;
#define NVME_ADMIN_TIMEOUT (admin_timeout * HZ)
-#define NVME_DEFAULT_KATO 5
+#define NVME_DEFAULT_KATO 10
#ifdef CONFIG_ARCH_NO_SG_CHAIN
#define NVME_INLINE_SG_CNT 0
--
>
> [1]
> # time nvme reset /dev/nvme0
>
> real 0m3.049s
> user 0m0.000s
> sys 0m0.006s
> [2]
> # time nvme reset /dev/nvme0
>
> real 0m12.498s
> user 0m0.000s
> sys 0m0.006s
> [3]
> commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc (HEAD)
> Author: Max Gurtovoy <maxg at mellanox.com>
> Date: Tue May 19 17:05:56 2020 +0300
>
> nvme-rdma: add metadata/T10-PI support
>
> [4]
> commit a70b81bd4d9d2d6c05cfe6ef2a10bccc2e04357a (HEAD)
> Author: Hannes Reinecke <hare at suse.de>
> Date: Fri Apr 16 13:46:20 2021 +0200
>
> nvme: sanitize KATO setting-
This change effectively changed the keep-alive timeout
from 15 to 5 and modified the host to send keepalives every
2.5 seconds instead of 5.
I guess that in combination that now it takes longer to
create and delete rdma resources (either qps or mrs)
it starts to timeout in setups where there are a lot of
queues.
More information about the Linux-nvme
mailing list