[bug report] reset_controller operation on NVMe/IB need more than 10s

Sagi Grimberg sagi at grimberg.me
Mon Aug 31 17:28:28 EDT 2020


> Change the Subject...
> 
> Finally I found the OOM was not triggered by reset_controller, but due 
> to my 500 times' reset_controller not finished in 1 hour, and triggered 
> test harness watchdog and lead OOM.
> 
> So the issue here is the reset_controller operation need more than 10s 
> now[1], by bisecting I found it was introduced from commit[2], and it 
> will just need 2 seconds if w/o this patch.
> 
> [1]
> # time echo 1 >/sys/block/nvme0n1/device/nvme0/reset_controller
> 
> real    0m10.392s
> user    0m0.000s
> sys    0m0.000s
> 
> [2]
> commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc
> Author: Max Gurtovoy <maxg at mellanox.com>
> Date:   Tue May 19 17:05:56 2020 +0300
> 
>      nvme-rdma: add metadata/T10-PI support
> 
>      For capable HCAs (e.g. ConnectX-5/ConnectX-6) this will allow 
> end-to-end
>      protection information passthrough and validation for NVMe over RDMA
>      transport. Metadata offload support was implemented over the new RDMA
>      signature verbs API and it is enabled for capable controllers.
> 
>      Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
>      Signed-off-by: Israel Rukshin <israelr at mellanox.com>
>      Signed-off-by: Christoph Hellwig <hch at lst.de>
> 
> [3] w/o above patch
> # time echo 1 >/sys/block/nvme0n1/device/nvme0/reset_controller
> 
> real    0m2.132s
> user    0m0.000s
> sys    0m0.000s

Max, are you looking into this?



More information about the Linux-nvme mailing list