[bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing

Max Gurtovoy mgurtovoy at nvidia.com
Thu Mar 10 03:52:15 PST 2022


On 3/9/2022 12:59 AM, Yi Zhang wrote:
> On Tue, Mar 8, 2022 at 11:51 PM Max Gurtovoy <mgurtovoy at nvidia.com> wrote:
>> Hi Yi Zhang,
>>
>> Please send the commands to repro.
>>
>> I run the following with no success to repro:
>>
>> for i in `seq 100`; do echo $i &&  cat /sys/kernel/debug/kmemleak &&
>> echo clear > /sys/kernel/debug/kmemleak && nvme reset /dev/nvme2 &&
>> sleep 5 && echo scan > /sys/kernel/debug/kmemleak ; done
> Hi Max
> Sorry, I should add more details when I report it.
> The kmemleak observed when I was reproducing the "nvme reset" timeout
> issue we discussed before[1], and the cmd I used are[2]
>
> [1]
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-nvme%2FCAHj4cs_ir917u7Up5PBfwWpZtnVLey69pXXNjFNAjbqQ5vwU0w%40mail.gmail.com%2FT%2F%23m5e6dcc434fc1925b18047c348226cfbc48ffbd14&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C8cef8eb496e84d35f52308da01575419%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637823771831899724%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=kjMvRAWlBe1ym3FDQO1rdZ9%2FwtKQpscvXRG48aTt3L0%3D&reserved=0
> [2]
> # nvme connect to target
> # nvme reset /dev/nvme0
> # nvme disconnect-all
> # sleep 10
> # echo scan > /sys/kernel/debug/kmemleak
> # sleep 60
> # cat /sys/kernel/debug/kmemleak
>
Thanks I was able to repro it with the above commands.

Still not clear where is the leak is, but I do see some non-symmetric 
code in the error flows that we need to fix. Plus the keep-alive timing 
movement.

It will take some time for me to debug this.

Can you repro it with tcp transport as well ?

maybe add some debug prints to catch the exact flow it happens ?

>> -Max.
>>
>> On 2/21/2022 1:37 PM, Yi Zhang wrote:
>>> Hello
>>>
>>> Below kmemleak triggered when I do nvme connect/reset/disconnect
>>> operations on latest 5.17.0-rc5, pls check it.
>>>
>>> # cat /sys/kernel/debug/kmemleak
>>> unreferenced object 0xffff8883e398bc00 (size 192):
>>>     comm "nvme", pid 2632, jiffies 4295317772 (age 2951.476s)
>>>     hex dump (first 32 bytes):
>>>       80 50 84 a3 ff ff ff ff 70 d4 12 67 81 88 ff ff  .P......p..g....
>>>       01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>     backtrace:
>>>       [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
>>>       [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
>>>       [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
>>>       [<00000000aade682c>] blk_alloc_queue+0x400/0x840
>>>       [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
>>>       [<00000000cbff6d39>] nvme_rdma_setup_ctrl+0x4ca/0x15f0 [nvme_rdma]
>>>       [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
>>>       [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>>>       [<0000000031d8624b>] vfs_write+0x17e/0x9a0
>>>       [<00000000471d7945>] ksys_write+0xf1/0x1c0
>>>       [<00000000a963bc79>] do_syscall_64+0x3a/0x80
>>>       [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>>> unreferenced object 0xffff8883e398a700 (size 192):
>>>     comm "nvme", pid 2632, jiffies 4295317782 (age 2951.466s)
>>>     hex dump (first 32 bytes):
>>>       80 50 84 a3 ff ff ff ff 60 c8 12 67 81 88 ff ff  .P......`..g....
>>>       01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>     backtrace:
>>>       [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
>>>       [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
>>>       [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
>>>       [<00000000aade682c>] blk_alloc_queue+0x400/0x840
>>>       [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
>>>       [<000000004f80b965>] nvme_rdma_setup_ctrl+0xf37/0x15f0 [nvme_rdma]
>>>       [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
>>>       [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>>>       [<0000000031d8624b>] vfs_write+0x17e/0x9a0
>>>       [<00000000471d7945>] ksys_write+0xf1/0x1c0
>>>       [<00000000a963bc79>] do_syscall_64+0x3a/0x80
>>>       [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>>> unreferenced object 0xffff8894253d9d00 (size 192):
>>>     comm "nvme", pid 2632, jiffies 4295331915 (age 2937.333s)
>>>     hex dump (first 32 bytes):
>>>       80 50 84 a3 ff ff ff ff 80 e0 12 67 81 88 ff ff  .P.........g....
>>>       01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>     backtrace:
>>>       [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220
>>>       [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380
>>>       [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610
>>>       [<00000000aade682c>] blk_alloc_queue+0x400/0x840
>>>       [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100
>>>       [<000000009f9abba5>] nvme_rdma_setup_ctrl.cold.70+0x5ee/0xb01 [nvme_rdma]
>>>       [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma]
>>>       [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
>>>       [<0000000031d8624b>] vfs_write+0x17e/0x9a0
>>>       [<00000000471d7945>] ksys_write+0xf1/0x1c0
>>>       [<00000000a963bc79>] do_syscall_64+0x3a/0x80
>>>       [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>
>>>
>>>
>



More information about the Linux-nvme mailing list