[bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing

Sun Mar 20 06:00:24 PDT 2022

>>> # nvme connect to target
>>> # nvme reset /dev/nvme0
>>> # nvme disconnect-all
>>> # sleep 10
>>> # echo scan > /sys/kernel/debug/kmemleak
>>> # sleep 60
>>> # cat /sys/kernel/debug/kmemleak
>>>
>> Thanks I was able to repro it with the above commands.
>>
>> Still not clear where is the leak is, but I do see some non-symmetric
>> code in the error flows that we need to fix. Plus the keep-alive timing
>> movement.
>>
>> It will take some time for me to debug this.
>>
>> Can you repro it with tcp transport as well ?
> 
> Yes, nvme/tcp also can reproduce it, here is the log:
> 
> unreferenced object 0xffff8881675f7000 (size 192):
>    comm "nvme", pid 3711, jiffies 4296033311 (age 2272.976s)
>    hex dump (first 32 bytes):
>      20 59 04 92 ff ff ff ff 00 00 da 13 81 88 ff ff   Y..............
>      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>    backtrace:
>      [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
>      [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
>      [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
>      [<000000002653e58d>] blk_alloc_queue+0x400/0x840
>      [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
>      [<00000000486936b6>] nvme_tcp_setup_ctrl+0x70c/0xbe0 [nvme_tcp]
>      [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
>      [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>      [<0000000056b79a25>] vfs_write+0x17e/0x9a0
>      [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
>      [<00000000c035c128>] do_syscall_64+0x3a/0x80
>      [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> unreferenced object 0xffff8881675f7600 (size 192):
>    comm "nvme", pid 3711, jiffies 4296033320 (age 2272.967s)
>    hex dump (first 32 bytes):
>      20 59 04 92 ff ff ff ff 00 00 22 92 81 88 ff ff   Y........".....
>      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>    backtrace:
>      [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
>      [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
>      [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
>      [<000000002653e58d>] blk_alloc_queue+0x400/0x840
>      [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
>      [<000000006ca5f9f6>] nvme_tcp_setup_ctrl+0x772/0xbe0 [nvme_tcp]
>      [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
>      [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>      [<0000000056b79a25>] vfs_write+0x17e/0x9a0
>      [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
>      [<00000000c035c128>] do_syscall_64+0x3a/0x80
>      [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> unreferenced object 0xffff8891fb6a3600 (size 192):
>    comm "nvme", pid 3711, jiffies 4296033511 (age 2272.776s)
>    hex dump (first 32 bytes):
>      20 59 04 92 ff ff ff ff 00 00 5c 1d 81 88 ff ff   Y........\.....
>      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>    backtrace:
>      [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
>      [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
>      [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
>      [<000000002653e58d>] blk_alloc_queue+0x400/0x840
>      [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
>      [<000000004a3bf20e>] nvme_tcp_setup_ctrl.cold.57+0x868/0xa5d [nvme_tcp]
>      [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
>      [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>      [<0000000056b79a25>] vfs_write+0x17e/0x9a0
>      [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
>      [<00000000c035c128>] do_syscall_64+0x3a/0x80
>      [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Looks like there is some asymmetry on blk_iolatency. It is intialized
when allocating a request queue and exited when deleting a genhd. In
nvme we have request queues that will never have genhd that corresponds
to them (like the admin queue).

Does this patch eliminate the issue?
--

diff --git a/block/blk-core.c b/block/blk-core.c
index 94bf37f8e61d..6ccc02a41f25 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -323,6 +323,7 @@ void blk_cleanup_queue(struct request_queue *q)

         blk_queue_flag_set(QUEUE_FLAG_DEAD, q);

+       rq_qos_exit(q);
         blk_sync_queue(q);
         if (queue_is_mq(q)) {
                 blk_mq_cancel_work_sync(q);
diff --git a/block/genhd.c b/block/genhd.c
index 54f60ded2ee6..10ff0606c100 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -626,7 +626,6 @@ void del_gendisk(struct gendisk *disk)

         blk_mq_freeze_queue_wait(q);

-       rq_qos_exit(q);
         blk_sync_queue(q);
         blk_flush_integrity();
         /*
--