[bug report] reset_controller operation on NVMe/IB need more than 10s
Sagi Grimberg
sagi at grimberg.me
Mon Aug 31 17:28:28 EDT 2020
> Change the Subject...
>
> Finally I found the OOM was not triggered by reset_controller, but due
> to my 500 times' reset_controller not finished in 1 hour, and triggered
> test harness watchdog and lead OOM.
>
> So the issue here is the reset_controller operation need more than 10s
> now[1], by bisecting I found it was introduced from commit[2], and it
> will just need 2 seconds if w/o this patch.
>
> [1]
> # time echo 1 >/sys/block/nvme0n1/device/nvme0/reset_controller
>
> real 0m10.392s
> user 0m0.000s
> sys 0m0.000s
>
> [2]
> commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc
> Author: Max Gurtovoy <maxg at mellanox.com>
> Date: Tue May 19 17:05:56 2020 +0300
>
> nvme-rdma: add metadata/T10-PI support
>
> For capable HCAs (e.g. ConnectX-5/ConnectX-6) this will allow
> end-to-end
> protection information passthrough and validation for NVMe over RDMA
> transport. Metadata offload support was implemented over the new RDMA
> signature verbs API and it is enabled for capable controllers.
>
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> Signed-off-by: Israel Rukshin <israelr at mellanox.com>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
>
> [3] w/o above patch
> # time echo 1 >/sys/block/nvme0n1/device/nvme0/reset_controller
>
> real 0m2.132s
> user 0m0.000s
> sys 0m0.000s
Max, are you looking into this?
More information about the Linux-nvme
mailing list