[bug report] NVMe/IB: reset_controller need more than 1min
Yi Zhang
yi.zhang at redhat.com
Tue Dec 14 17:15:47 PST 2021
On Tue, Dec 14, 2021 at 8:01 PM Max Gurtovoy <mgurtovoy at nvidia.com> wrote:
>
>
> On 12/14/2021 12:39 PM, Sagi Grimberg wrote:
> >
> >>>> Hi Sagi
> >>>> It is still reproducible with the change, here is the log:
> >>>>
> >>>> # time nvme reset /dev/nvme0
> >>>>
> >>>> real 0m12.973s
> >>>> user 0m0.000s
> >>>> sys 0m0.006s
> >>>> # time nvme reset /dev/nvme0
> >>>>
> >>>> real 1m15.606s
> >>>> user 0m0.000s
> >>>> sys 0m0.007s
> >>>
> >>> Does it speed up if you use less queues? (i.e. connect with -i 4) ?
> >> Yes, with -i 4, it has stablee 1.3s
> >> # time nvme reset /dev/nvme0
> >
> > So it appears that destroying a qp takes a long time on
> > IB for some reason...
> >
> >> real 0m1.225s
> >> user 0m0.000s
> >> sys 0m0.007s
> >>
> >>>
> >>>>
> >>>> # dmesg | grep nvme
> >>>> [ 900.634877] nvme nvme0: resetting controller
> >>>> [ 909.026958] nvme nvme0: creating 40 I/O queues.
> >>>> [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues.
> >>>> [ 917.600993] nvme nvme0: resetting controller
> >>>> [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout
> >>>> [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14
> >>>> [ 988.608181] nvme nvme0: creating 40 I/O queues.
> >>>> [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues.
> >>>>
> >>>> BTW, this issue cannot be reproduced on my NVME/ROCE environment.
> >>>
> >>> Then I think that we need the rdma folks to help here...
> >
> > Max?
>
> It took me 12s to reset a controller with 63 IO queues with 5.16-rc3+.
>
> Can you try repro with latest versions please ?
>
> Or give the exact scenario ?
Yeah, both target and client are using Mellanox Technologies MT27700
Family [ConnectX-4], could you try stress "nvme reset /dev/nvme0", the
first time reset will take 12s, and it always can be reproduced at the
second reset operation.
>
>
--
Best Regards,
Yi Zhang
More information about the Linux-nvme
mailing list