[PATCH V4] nvme-fabrics: reject I/O to offline device
Victor Gladkov
Victor.Gladkov at kioxia.com
Tue Jun 30 09:20:49 EDT 2020
Hi Sagi.
Thank you for reviewing.
I'm going pull out new patch updated according to your comments.
In the meantime, see my comments below.
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index
set_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
> > + dev_info(ctrl->device, "failfast expired set for controller %s\n",
> > + ctrl->opts->subsysnqn);
>
> What does the subsysnqn has to do with anything? just keep "failfast expired"[VG]
[VG] I agree with you
>
> > + nvme_kick_requeue_lists(ctrl);
> > +out:
> > + spin_unlock_irq(&ctrl->lock);
>
> What is the lock protecting against? I don't see why this is needed.
>
[VG] You're right.
> > +}
> > +
> > +static inline void nvme_start_failfast_work(struct nvme_ctrl *ctrl) {
> > + if (!ctrl->opts || ctrl->opts->fast_io_fail_tmo == 0)
>
> the timeout disabled should be -1, 0 is to fail I/O right away.
>
> Maybe also add "off" string as a -1 equivalent.
>
[VG] It's ok to me. I'll add "off" string option.
> > diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> > index 2a6c819..3a39714 100644
> > --- a/drivers/nvme/host/fabrics.c
> > +++ b/drivers/nvme/host/fabrics.c
> > @@ -631,6 +633,7 @@ static int nvmf_parse_options(struct
> nvmf_ctrl_options *opts,
> > opts->reconnect_delay = NVMF_DEF_RECONNECT_DELAY;
> > opts->kato = NVME_DEFAULT_KATO;
> > opts->duplicate_connect = false;
> > + opts->fast_io_fail_tmo = NVMF_DEF_FAIL_FAST_TMO;
> > opts->hdr_digest = false;
> > opts->data_digest = false;
> > opts->tos = -1; /* < 0 == use transport default */ @@ -751,6
> > +754,19 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
> > pr_warn("ctrl_loss_tmo < 0 will reconnect
> forever\n");
> > ctrl_loss_tmo = token;
> > break;
> > + case NVMF_OPT_FAIL_FAST_TMO:
> > + if (match_int(args, &token)) {
> > + ret = -EINVAL;
> > + goto out;
> > + }
> > +
> > + if (token)
> > + pr_warn("fail_fast_tmo != 0, I/O will fail on"
> > + " reconnect controller after %d sec\n",
> > + token);
> > +
> > + opts->fast_io_fail_tmo = token;
> > + break;
>
> Again, -1 should disable, not 0.
[VG] You're right.
>
> > case NVMF_OPT_HOSTNQN:
> > if (opts->host) {
> > pr_err("hostnqn already user-assigned: %s\n", @@ -
> 881,11 +897,14
> > @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
> > opts->nr_poll_queues = 0;
> > opts->duplicate_connect = true;
> > }
> > - if (ctrl_loss_tmo < 0)
> > + if (ctrl_loss_tmo < 0) {
> > opts->max_reconnects = -1;
> > - else
> > - opts->max_reconnects = DIV_ROUND_UP(ctrl_loss_tmo,
> > - opts->reconnect_delay);
> > + } else {
> > + if (ctrl_loss_tmo < opts->fast_io_fail_tmo)
> > + pr_warn("failfast tmo (%d) larger than controller "
> > + "loss tmo (%d)\n",
> > + opts->fast_io_fail_tmo, ctrl_loss_tmo);
> > + }
>
> it can be that ctrl_loss_tmo=-1 which means reconnect forever.
[VG] OK
> > diff --git a/drivers/nvme/host/multipath.c
> > b/drivers/nvme/host/multipath.c index 54603bd..4b66504 100644
> > --- a/drivers/nvme/host/multipath.c
> > +++ b/drivers/nvme/host/multipath.c
> > @@ -278,9 +278,13 @@ static bool nvme_available_path(struct
> > nvme_ns_head *head)
> >
> > list_for_each_entry_rcu(ns, &head->list, siblings) {
> > switch (ns->ctrl->state) {
> > + case NVME_CTRL_CONNECTING:
> > + if (test_bit(NVME_CTRL_FAILFAST_EXPIRED,
> > + &ns->ctrl->flags))
> > + break;
> > + /* fallthru */
>
> I absolutely don't see why this needs to happen. this setting is on a controller,
> so this absolutely should not affect the mpath device node. so nak on this.
[VG] In case of multipath is enabled we need announce the path as unavailable,
otherwise the command will requeue but not aborted.
More information about the Linux-nvme
mailing list