[bug report] NVMe/IB: reset_controller need more than 1min
Yi Zhang
yi.zhang at redhat.com
Sun Dec 12 22:12:24 PST 2021
On Sun, Dec 12, 2021 at 5:45 PM Sagi Grimberg <sagi at grimberg.me> wrote:
>
>
>
> On 12/11/21 5:01 AM, Yi Zhang wrote:
> > On Fri, Jun 25, 2021 at 12:14 AM Yi Zhang <yi.zhang at redhat.com> wrote:
> >>
> >> On Thu, Jun 24, 2021 at 5:32 AM Sagi Grimberg <sagi at grimberg.me> wrote:
> >>>
> >>>
> >>>> Hello
> >>>>
> >>>> Gentle ping here, this issue still exists on latest 5.13-rc7
> >>>>
> >>>> # time nvme reset /dev/nvme0
> >>>>
> >>>> real 0m12.636s
> >>>> user 0m0.002s
> >>>> sys 0m0.005s
> >>>> # time nvme reset /dev/nvme0
> >>>>
> >>>> real 0m12.641s
> >>>> user 0m0.000s
> >>>> sys 0m0.007s
> >>>
> >>> Strange that even normal resets take so long...
> >>> What device are you using?
> >>
> >> Hi Sagi
> >>
> >> Here is the device info:
> >> Mellanox Technologies MT27700 Family [ConnectX-4]
> >>
> >>>
> >>>> # time nvme reset /dev/nvme0
> >>>>
> >>>> real 1m16.133s
> >>>> user 0m0.000s
> >>>> sys 0m0.007s
> >>>
> >>> There seems to be a spurious command timeout here, but maybe this
> >>> is due to the fact that the queues take so long to connect and
> >>> the target expires the keep-alive timer.
> >>>
> >>> Does this patch help?
> >>
> >> The issue still exists, let me know if you need more testing for it. :)
> >
> > Hi Sagi
> > ping, this issue still can be reproduced on the latest
> > linux-block/for-next, do you have a chance to recheck it, thanks.
>
> Can you check if it happens with the below patch:
Hi Sagi
It is still reproducible with the change, here is the log:
# time nvme reset /dev/nvme0
real 0m12.973s
user 0m0.000s
sys 0m0.006s
# time nvme reset /dev/nvme0
real 1m15.606s
user 0m0.000s
sys 0m0.007s
# dmesg | grep nvme
[ 900.634877] nvme nvme0: resetting controller
[ 909.026958] nvme nvme0: creating 40 I/O queues.
[ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues.
[ 917.600993] nvme nvme0: resetting controller
[ 988.562230] nvme nvme0: I/O 2 QID 0 timeout
[ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14
[ 988.608181] nvme nvme0: creating 40 I/O queues.
[ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues.
BTW, this issue cannot be reproduced on my NVME/ROCE environment.
> --
> diff --git a/drivers/nvme/target/fabrics-cmd.c
> b/drivers/nvme/target/fabrics-cmd.c
> index f91a56180d3d..6e5aadfb07a0 100644
> --- a/drivers/nvme/target/fabrics-cmd.c
> +++ b/drivers/nvme/target/fabrics-cmd.c
> @@ -191,6 +191,14 @@ static u16 nvmet_install_queue(struct nvmet_ctrl
> *ctrl, struct nvmet_req *req)
> }
> }
>
> + /*
> + * Controller establishment flow may take some time, and the
> host may not
> + * send us keep-alive during this period, hence reset the
> + * traffic based keep-alive timer so we don't trigger a
> + * controller teardown as a result of a keep-alive expiration.
> + */
> + ctrl->reset_tbkas = true;
> +
> return 0;
>
> err:
> --
>
--
Best Regards,
Yi Zhang
More information about the Linux-nvme
mailing list