[bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing

Yi Zhang yi.zhang at redhat.com
Fri Mar 18 23:54:31 PDT 2022


On Thu, Mar 10, 2022 at 7:52 PM Max Gurtovoy <mgurtovoy at nvidia.com> wrote:
>
>
> On 3/9/2022 12:59 AM, Yi Zhang wrote:
> > On Tue, Mar 8, 2022 at 11:51 PM Max Gurtovoy <mgurtovoy at nvidia.com> wrote:
> >> Hi Yi Zhang,
> >>
> >> Please send the commands to repro.
> >>
> >> I run the following with no success to repro:
> >>
> >> for i in `seq 100`; do echo $i &&  cat /sys/kernel/debug/kmemleak &&
> >> echo clear > /sys/kernel/debug/kmemleak && nvme reset /dev/nvme2 &&
> >> sleep 5 && echo scan > /sys/kernel/debug/kmemleak ; done
> > Hi Max
> > Sorry, I should add more details when I report it.
> > The kmemleak observed when I was reproducing the "nvme reset" timeout
> > issue we discussed before[1], and the cmd I used are[2]
> >
> > [1]
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-nvme%2FCAHj4cs_ir917u7Up5PBfwWpZtnVLey69pXXNjFNAjbqQ5vwU0w%40mail.gmail.com%2FT%2F%23m5e6dcc434fc1925b18047c348226cfbc48ffbd14&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C8cef8eb496e84d35f52308da01575419%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637823771831899724%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=kjMvRAWlBe1ym3FDQO1rdZ9%2FwtKQpscvXRG48aTt3L0%3D&reserved=0
> > [2]
> > # nvme connect to target
> > # nvme reset /dev/nvme0
> > # nvme disconnect-all
> > # sleep 10
> > # echo scan > /sys/kernel/debug/kmemleak
> > # sleep 60
> > # cat /sys/kernel/debug/kmemleak
> >
> Thanks I was able to repro it with the above commands.
>
> Still not clear where is the leak is, but I do see some non-symmetric
> code in the error flows that we need to fix. Plus the keep-alive timing
> movement.
>
> It will take some time for me to debug this.
>
> Can you repro it with tcp transport as well ?

Yes, nvme/tcp also can reproduce it, here is the log:

unreferenced object 0xffff8881675f7000 (size 192):
  comm "nvme", pid 3711, jiffies 4296033311 (age 2272.976s)
  hex dump (first 32 bytes):
    20 59 04 92 ff ff ff ff 00 00 da 13 81 88 ff ff   Y..............
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
    [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
    [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
    [<000000002653e58d>] blk_alloc_queue+0x400/0x840
    [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
    [<00000000486936b6>] nvme_tcp_setup_ctrl+0x70c/0xbe0 [nvme_tcp]
    [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
    [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
    [<0000000056b79a25>] vfs_write+0x17e/0x9a0
    [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
    [<00000000c035c128>] do_syscall_64+0x3a/0x80
    [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae
unreferenced object 0xffff8881675f7600 (size 192):
  comm "nvme", pid 3711, jiffies 4296033320 (age 2272.967s)
  hex dump (first 32 bytes):
    20 59 04 92 ff ff ff ff 00 00 22 92 81 88 ff ff   Y........".....
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
    [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
    [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
    [<000000002653e58d>] blk_alloc_queue+0x400/0x840
    [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
    [<000000006ca5f9f6>] nvme_tcp_setup_ctrl+0x772/0xbe0 [nvme_tcp]
    [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
    [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
    [<0000000056b79a25>] vfs_write+0x17e/0x9a0
    [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
    [<00000000c035c128>] do_syscall_64+0x3a/0x80
    [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae
unreferenced object 0xffff8891fb6a3600 (size 192):
  comm "nvme", pid 3711, jiffies 4296033511 (age 2272.776s)
  hex dump (first 32 bytes):
    20 59 04 92 ff ff ff ff 00 00 5c 1d 81 88 ff ff   Y........\.....
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
    [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
    [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
    [<000000002653e58d>] blk_alloc_queue+0x400/0x840
    [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
    [<000000004a3bf20e>] nvme_tcp_setup_ctrl.cold.57+0x868/0xa5d [nvme_tcp]
    [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
    [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
    [<0000000056b79a25>] vfs_write+0x17e/0x9a0
    [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
    [<00000000c035c128>] do_syscall_64+0x3a/0x80
    [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae



>
> maybe add some debug prints to catch the exact flow it happens ?
>
> >> -Max.
> >>
> >> On 2/21/2022 1:37 PM, Yi Zhang wrote:
> >>> Hello
> >>>
> >>> Below kmemleak triggered when I do nvme connect/reset/disconnect
> >>> operations on latest 5.17.0-rc5, pls check it.
> >>>
> >>> # cat /sys/kernel/debug/kmemleak
> >>> unreferenced object 0xffff8883e398bc00 (size 192):
> >>>     comm "nvme", pid 2632, jiffies 4295317772 (age 2951.476s)
> >>>     hex dump (first 32 bytes):
> >>>       80 50 84 a3 ff ff ff ff 70 d4 12 67 81 88 ff ff  .P......p..g....
> >>>       01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>>     backtrace:
> >>>       [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
> >>>       [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
> >>>       [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
> >>>       [<00000000aade682c>] blk_alloc_queue+0x400/0x840
> >>>       [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
> >>>       [<00000000cbff6d39>] nvme_rdma_setup_ctrl+0x4ca/0x15f0 [nvme_rdma]
> >>>       [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
> >>>       [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
> >>>       [<0000000031d8624b>] vfs_write+0x17e/0x9a0
> >>>       [<00000000471d7945>] ksys_write+0xf1/0x1c0
> >>>       [<00000000a963bc79>] do_syscall_64+0x3a/0x80
> >>>       [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>> unreferenced object 0xffff8883e398a700 (size 192):
> >>>     comm "nvme", pid 2632, jiffies 4295317782 (age 2951.466s)
> >>>     hex dump (first 32 bytes):
> >>>       80 50 84 a3 ff ff ff ff 60 c8 12 67 81 88 ff ff  .P......`..g....
> >>>       01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>>     backtrace:
> >>>       [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
> >>>       [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
> >>>       [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
> >>>       [<00000000aade682c>] blk_alloc_queue+0x400/0x840
> >>>       [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
> >>>       [<000000004f80b965>] nvme_rdma_setup_ctrl+0xf37/0x15f0 [nvme_rdma]
> >>>       [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
> >>>       [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
> >>>       [<0000000031d8624b>] vfs_write+0x17e/0x9a0
> >>>       [<00000000471d7945>] ksys_write+0xf1/0x1c0
> >>>       [<00000000a963bc79>] do_syscall_64+0x3a/0x80
> >>>       [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>> unreferenced object 0xffff8894253d9d00 (size 192):
> >>>     comm "nvme", pid 2632, jiffies 4295331915 (age 2937.333s)
> >>>     hex dump (first 32 bytes):
> >>>       80 50 84 a3 ff ff ff ff 80 e0 12 67 81 88 ff ff  .P.........g....
> >>>       01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>>     backtrace:
> >>>       [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
> >>>       [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
> >>>       [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
> >>>       [<00000000aade682c>] blk_alloc_queue+0x400/0x840
> >>>       [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
> >>>       [<000000009f9abba5>] nvme_rdma_setup_ctrl.cold.70+0x5ee/0xb01 [nvme_rdma]
> >>>       [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
> >>>       [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
> >>>       [<0000000031d8624b>] vfs_write+0x17e/0x9a0
> >>>       [<00000000471d7945>] ksys_write+0xf1/0x1c0
> >>>       [<00000000a963bc79>] do_syscall_64+0x3a/0x80
> >>>       [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>>
> >>>
> >>>
> >
>


-- 
Best Regards,
  Yi Zhang




More information about the Linux-nvme mailing list