[PATCH v9] nvme-fabrics: reject I/O to offline device
Victor Gladkov
Victor.Gladkov at kioxia.com
Sun Nov 15 10:45:14 EST 2020
> -----Original Message-----
> From: Hannes Reinecke [mailto:hare at suse.de]
> Sent: Thursday, 01 October, 2020 11:55
> > ---
> > drivers/nvme/host/core.c | 49
> > ++++++++++++++++++++++++++++++++++++++++++-
> > drivers/nvme/host/fabrics.c | 25 +++++++++++++++++++---
> > drivers/nvme/host/fabrics.h | 5 +++++
> > drivers/nvme/host/multipath.c | 5 ++++-
> > drivers/nvme/host/nvme.h | 3 +++
> > 5 files changed, 82 insertions(+), 5 deletions(-)
> >
> I did some more experiments with this, and found that there are some issues
> with ANA handling.
> If reconnect works, but the ANA state indicates that we still can't sent I/O (eg
> by still being in transitioning), we hit the 'requeueing I/O'
> state despite fast_io_fail_tmo being set. Not sure if that's the expected
> outcome.
> For that it might be better to move the FAILFAST_EXPIRED bit into the
> namespace, as then we could selectively clear the bit in
> nvme_failfast_work():
>
> @@ -151,12 +151,16 @@ EXPORT_SYMBOL_GPL(nvme_try_sched_reset);
> static void nvme_failfast_work(struct work_struct *work) {
> struct nvme_ctrl *ctrl = container_of(to_delayed_work(work),
> struct nvme_ctrl, failfast_work);
> + struct nvme_ns *ns;
>
> - if (ctrl->state != NVME_CTRL_CONNECTING)
> - return;
> -
> -
> - set_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
> + down_read(&ctrl->namespaces_rwsem);
> + list_for_each_entry(ns, &ctrl->namespaces, list) {
> + if (ctrl->state != NVME_CTRL_LIVE ||
> + (ns->ana_state != NVME_ANA_OPTIMIZED &&
> + ns->ana_state != NVME_ANA_NONOPTIMIZED))
> + set_bit(NVME_NS_FAILFAST_EXPIRED, &ns->flags);
> + }
> + up_read(&ctrl->namespaces_rwsem);
> dev_info(ctrl->device, "failfast expired\n");
>
> ...and we could leave the failfast worker running even after the controller
> transitioned to LIVE.
> Cf the attached patch for details.
>
> Cheers,
>
> Hannes
> --
I'm not sure what makes sense to move the FAILFAST_EXPIRED bit into the namespace,
Because the failfast mechanism characterizes the controller as a whole.
Any comments on this?
Regards,
Victor
More information about the Linux-nvme
mailing list