[PATCH v9] nvme-fabrics: reject I/O to offline device

Victor Gladkov Victor.Gladkov at kioxia.com
Sun Nov 15 10:45:14 EST 2020


> -----Original Message-----
> From: Hannes Reinecke [mailto:hare at suse.de]
> Sent: Thursday, 01 October, 2020 11:55

> > ---
> >   drivers/nvme/host/core.c      | 49
> > ++++++++++++++++++++++++++++++++++++++++++-
> >   drivers/nvme/host/fabrics.c   | 25 +++++++++++++++++++---
> >   drivers/nvme/host/fabrics.h   |  5 +++++
> >   drivers/nvme/host/multipath.c |  5 ++++-
> >   drivers/nvme/host/nvme.h      |  3 +++
> >   5 files changed, 82 insertions(+), 5 deletions(-)
> >
> I did some more experiments with this, and found that there are some issues
> with ANA handling.
> If reconnect works, but the ANA state indicates that we still can't sent I/O (eg
> by still being in transitioning), we hit the 'requeueing I/O'
> state despite fast_io_fail_tmo being set. Not sure if that's the expected
> outcome.
> For that it might be better to move the FAILFAST_EXPIRED bit into the
> namespace, as then we could selectively clear the bit in
> nvme_failfast_work():
> 
> @@ -151,12 +151,16 @@ EXPORT_SYMBOL_GPL(nvme_try_sched_reset);
>   static void nvme_failfast_work(struct work_struct *work) {
>          struct nvme_ctrl *ctrl = container_of(to_delayed_work(work),
>                          struct nvme_ctrl, failfast_work);
> +       struct nvme_ns *ns;
> 
> -       if (ctrl->state != NVME_CTRL_CONNECTING)
> -               return;
> -
> -
> -       set_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
> +       down_read(&ctrl->namespaces_rwsem);
> +       list_for_each_entry(ns, &ctrl->namespaces, list) {
> +               if (ctrl->state != NVME_CTRL_LIVE ||
> +                   (ns->ana_state != NVME_ANA_OPTIMIZED &&
> +                    ns->ana_state != NVME_ANA_NONOPTIMIZED))
> +                       set_bit(NVME_NS_FAILFAST_EXPIRED, &ns->flags);
> +       }
> +       up_read(&ctrl->namespaces_rwsem);
>          dev_info(ctrl->device, "failfast expired\n");
> 
> ...and we could leave the failfast worker running even after the controller
> transitioned to LIVE.
> Cf the attached patch for details.
> 
> Cheers,
> 
> Hannes
> --

I'm not sure what makes sense to move the FAILFAST_EXPIRED bit into the namespace,
Because the failfast mechanism characterizes the controller as a whole.

Any comments on this?

Regards,
Victor




More information about the Linux-nvme mailing list