[bug report] NVMe/IB: reset_controller need more than 1min

Mon Dec 13 09:05:09 PST 2021

On Mon, Dec 13, 2021 at 5:05 PM Sagi Grimberg <sagi at grimberg.me> wrote:
>
>
> >>>>>> Hello
> >>>>>>
> >>>>>> Gentle ping here, this issue still exists on latest 5.13-rc7
> >>>>>>
> >>>>>> # time nvme reset /dev/nvme0
> >>>>>>
> >>>>>> real 0m12.636s
> >>>>>> user 0m0.002s
> >>>>>> sys 0m0.005s
> >>>>>> # time nvme reset /dev/nvme0
> >>>>>>
> >>>>>> real 0m12.641s
> >>>>>> user 0m0.000s
> >>>>>> sys 0m0.007s
> >>>>>
> >>>>> Strange that even normal resets take so long...
> >>>>> What device are you using?
> >>>>
> >>>> Hi Sagi
> >>>>
> >>>> Here is the device info:
> >>>> Mellanox Technologies MT27700 Family [ConnectX-4]
> >>>>
> >>>>>
> >>>>>> # time nvme reset /dev/nvme0
> >>>>>>
> >>>>>> real 1m16.133s
> >>>>>> user 0m0.000s
> >>>>>> sys 0m0.007s
> >>>>>
> >>>>> There seems to be a spurious command timeout here, but maybe this
> >>>>> is due to the fact that the queues take so long to connect and
> >>>>> the target expires the keep-alive timer.
> >>>>>
> >>>>> Does this patch help?
> >>>>
> >>>> The issue still exists, let me know if you need more testing for it. :)
> >>>
> >>> Hi Sagi
> >>> ping, this issue still can be reproduced on the latest
> >>> linux-block/for-next, do you have a chance to recheck it, thanks.
> >>
> >> Can you check if it happens with the below patch:
> >
> > Hi Sagi
> > It is still reproducible with the change, here is the log:
> >
> > # time nvme reset /dev/nvme0
> >
> > real    0m12.973s
> > user    0m0.000s
> > sys     0m0.006s
> > # time nvme reset /dev/nvme0
> >
> > real    1m15.606s
> > user    0m0.000s
> > sys     0m0.007s
>
> Does it speed up if you use less queues? (i.e. connect with -i 4) ?
Yes, with -i 4, it has stablee 1.3s
# time nvme reset /dev/nvme0

real 0m1.225s
user 0m0.000s
sys 0m0.007s

>
> >
> > # dmesg | grep nvme
> > [  900.634877] nvme nvme0: resetting controller
> > [  909.026958] nvme nvme0: creating 40 I/O queues.
> > [  913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues.
> > [  917.600993] nvme nvme0: resetting controller
> > [  988.562230] nvme nvme0: I/O 2 QID 0 timeout
> > [  988.567607] nvme nvme0: Property Set error: 881, offset 0x14
> > [  988.608181] nvme nvme0: creating 40 I/O queues.
> > [  993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues.
> >
> > BTW, this issue cannot be reproduced on my NVME/ROCE environment.
>
> Then I think that we need the rdma folks to help here...
>


-- 
Best Regards,
  Yi Zhang