[PATCH] nvme: don't wait freeze during resetting
Ming Lei
ming.lei at redhat.com
Tue Sep 20 18:25:44 PDT 2022
On Tue, Sep 20, 2022 at 11:18:33AM +0300, Sagi Grimberg wrote:
>
> > First it isn't necessary to call nvme_wait_freeze during reset.
> > For nvme-pci, if tagset isn't allocated, there can't be any inflight
> > IOs; otherwise blk_mq_update_nr_hw_queues can freeze & wait queues.
> >
> > Second, since commit bdd6316094e0 ("block: Allow unfreezing of a queue
> > while requests are in progress"), it is fine to unfreeze queue without
> > draining inflight IOs.
> >
> > Also both nvme-rdma and nvme-tcp's timeout handler provides forward
> > progress if the controller state isn't LIVE, so it is fine to drop
> > the timeout function of nvme_wait_freeze_timeout().
>
> The rdma/tcp should probably be split to separate patches.
>
> >
> > Cc: Sagi Grimberg <sagi at grimberg.me>
> > Cc: Chao Leng <lengchao at huawei.com>
> > Cc: Keith Busch <kbusch at kernel.org>
> > Signed-off-by: Ming Lei <ming.lei at redhat.com>
> > ---
> > drivers/nvme/host/apple.c | 1 -
> > drivers/nvme/host/pci.c | 1 -
> > drivers/nvme/host/rdma.c | 13 -------------
> > drivers/nvme/host/tcp.c | 13 -------------
> > 4 files changed, 28 deletions(-)
> >
> > diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
> > index 5fc5ea196b40..9cd02b57fc85 100644
> > --- a/drivers/nvme/host/apple.c
> > +++ b/drivers/nvme/host/apple.c
> > @@ -1126,7 +1126,6 @@ static void apple_nvme_reset_work(struct work_struct *work)
> > anv->ctrl.queue_count = nr_io_queues + 1;
> > nvme_start_queues(&anv->ctrl);
> > - nvme_wait_freeze(&anv->ctrl);
> > blk_mq_update_nr_hw_queues(&anv->tagset, 1);
> > nvme_unfreeze(&anv->ctrl);
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 98864b853eef..985b216907fc 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -2910,7 +2910,6 @@ static void nvme_reset_work(struct work_struct *work)
> > nvme_free_tagset(dev);
> > } else {
> > nvme_start_queues(&dev->ctrl);
> > - nvme_wait_freeze(&dev->ctrl);
> > if (!dev->ctrl.tagset)
> > nvme_pci_alloc_tag_set(dev);
> > else
> > diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> > index 3100643be299..beb0d1a6a84d 100644
> > --- a/drivers/nvme/host/rdma.c
> > +++ b/drivers/nvme/host/rdma.c
> > @@ -986,15 +986,6 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
> > if (!new) {
> > nvme_start_queues(&ctrl->ctrl);
> > - if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
> > - /*
> > - * If we timed out waiting for freeze we are likely to
> > - * be stuck. Fail the controller initialization just
> > - * to be safe.
> > - */
> > - ret = -ENODEV;
> > - goto out_wait_freeze_timed_out;
> > - }
>
> So here is the description from the patch that introduced this:
> --
> nvme-rdma: fix reset hang if controller died in the middle of a reset
>
> If the controller becomes unresponsive in the middle of a reset, we
> will hang because we are waiting for the freeze to complete, but that
> cannot happen since we have commands that are inflight holding the
> q_usage_counter, and we can't blindly fail requests that times out.
>
> So give a timeout and if we cannot wait for queue freeze before
> unfreezing, fail and have the error handling take care how to
> proceed (either schedule a reconnect of remove the controller).
> --
>
> So if between nvme_start_queues() and the freeze (with a full wait)
> that is done in blk_mq_update_nr_hw_queues() the controller becomes
> non responsive, in this case we may hang blocking on I/O that was
> pending and requeued after nvme_start_queues().
>
> The problem is, that we cannot do any error recovery because the
> controller is in the middle of a reset/reconnect...
> So the code that you deleted was designed to detect this state, and
> reschedule another reconnect if the controller became non responsive.
>
> What is preventing this from happening now?
Please see nvme_rdma_timeout() & nvme_tcp_timeout(), if controller state
isn't live, request will be aborted.
Thanks,
Ming
More information about the Linux-nvme
mailing list