[bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing

Mon Mar 21 21:58:04 PDT 2022

On Mon, Mar 21, 2022 at 5:25 PM Sagi Grimberg <sagi at grimberg.me> wrote:
>
>
> >>>>> # nvme connect to target
> >>>>> # nvme reset /dev/nvme0
> >>>>> # nvme disconnect-all
> >>>>> # sleep 10
> >>>>> # echo scan > /sys/kernel/debug/kmemleak
> >>>>> # sleep 60
> >>>>> # cat /sys/kernel/debug/kmemleak
> >>>>>
> >>>> Thanks I was able to repro it with the above commands.
> >>>>
> >>>> Still not clear where is the leak is, but I do see some non-symmetric
> >>>> code in the error flows that we need to fix. Plus the keep-alive timing
> >>>> movement.
> >>>>
> >>>> It will take some time for me to debug this.
> >>>>
> >>>> Can you repro it with tcp transport as well ?
> >>>
> >>> Yes, nvme/tcp also can reproduce it, here is the log:
>
> Looks like the offending commit was 8e141f9eb803 ("block: drain file
> system I/O on del_gendisk") which moved the call-site for a reason.
>
> However rq_qos_exit() should be reentrant safe, so can you verify
> that this change eliminates the issue as well?

Yes, this change also fixed the kmemleak, thanks.

> --
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 94bf37f8e61d..6ccc02a41f25 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -323,6 +323,7 @@ void blk_cleanup_queue(struct request_queue *q)
>
>          blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
>
> +       rq_qos_exit(q);
>          blk_sync_queue(q);
>          if (queue_is_mq(q)) {
>                  blk_mq_cancel_work_sync(q);
> --
>

-- 
Best Regards,
  Yi Zhang