[bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing

Mon Mar 21 02:25:46 PDT 2022

>>>>> # nvme connect to target
>>>>> # nvme reset /dev/nvme0
>>>>> # nvme disconnect-all
>>>>> # sleep 10
>>>>> # echo scan > /sys/kernel/debug/kmemleak
>>>>> # sleep 60
>>>>> # cat /sys/kernel/debug/kmemleak
>>>>>
>>>> Thanks I was able to repro it with the above commands.
>>>>
>>>> Still not clear where is the leak is, but I do see some non-symmetric
>>>> code in the error flows that we need to fix. Plus the keep-alive timing
>>>> movement.
>>>>
>>>> It will take some time for me to debug this.
>>>>
>>>> Can you repro it with tcp transport as well ?
>>>
>>> Yes, nvme/tcp also can reproduce it, here is the log:

Looks like the offending commit was 8e141f9eb803 ("block: drain file 
system I/O on del_gendisk") which moved the call-site for a reason.

However rq_qos_exit() should be reentrant safe, so can you verify
that this change eliminates the issue as well?
--

diff --git a/block/blk-core.c b/block/blk-core.c
index 94bf37f8e61d..6ccc02a41f25 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -323,6 +323,7 @@ void blk_cleanup_queue(struct request_queue *q)

         blk_queue_flag_set(QUEUE_FLAG_DEAD, q);

+       rq_qos_exit(q);
         blk_sync_queue(q);
         if (queue_is_mq(q)) {
                 blk_mq_cancel_work_sync(q);
--