[bug report] reset_controller operation on NVMe/IB need more than 10s
Yi Zhang
yi.zhang at redhat.com
Mon Aug 31 01:25:18 EDT 2020
Change the Subject...
Finally I found the OOM was not triggered by reset_controller, but due
to my 500 times' reset_controller not finished in 1 hour, and triggered
test harness watchdog and lead OOM.
So the issue here is the reset_controller operation need more than 10s
now[1], by bisecting I found it was introduced from commit[2], and it
will just need 2 seconds if w/o this patch.
[1]
# time echo 1 >/sys/block/nvme0n1/device/nvme0/reset_controller
real 0m10.392s
user 0m0.000s
sys 0m0.000s
[2]
commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc
Author: Max Gurtovoy <maxg at mellanox.com>
Date: Tue May 19 17:05:56 2020 +0300
nvme-rdma: add metadata/T10-PI support
For capable HCAs (e.g. ConnectX-5/ConnectX-6) this will allow
end-to-end
protection information passthrough and validation for NVMe over RDMA
transport. Metadata offload support was implemented over the new RDMA
signature verbs API and it is enabled for capable controllers.
Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
Signed-off-by: Israel Rukshin <israelr at mellanox.com>
Signed-off-by: Christoph Hellwig <hch at lst.de>
[3] w/o above patch
# time echo 1 >/sys/block/nvme0n1/device/nvme0/reset_controller
real 0m2.132s
user 0m0.000s
sys 0m0.000s
On 8/28/20 11:57 PM, Sagi Grimberg wrote:
>
>> Hello
>>
>> From 5.8-rc1 stress reset_controller operation will lead system OOM
>> on both target/host side.
>> This issue cannot be reproduced on v5.7 and still can be reproduced
>> on v5.9-rc1, let me know if you need more info, thanks.
>
> Does kmemleak complain during the test?
>
More information about the Linux-nvme
mailing list