[PATCH V4] nvme-fabrics: reject I/O to offline device

Victor Gladkov Victor.Gladkov at kioxia.com
Tue Jun 30 09:20:49 EDT 2020


Hi Sagi.
Thank you for reviewing.
I'm going pull out new patch updated according to your comments.
In the meantime, see my comments below.

> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index
set_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
> > +	dev_info(ctrl->device, "failfast expired set for controller %s\n",
> > +		ctrl->opts->subsysnqn);
> 
> What does the subsysnqn has to do with anything? just keep "failfast expired"[VG] 


[VG] I agree with you

> 
> > +	nvme_kick_requeue_lists(ctrl);
> > +out:
> > +	spin_unlock_irq(&ctrl->lock);
> 
> What is the lock protecting against? I don't see why this is needed.
> 

[VG]  You're right.

> > +}
> > +
> > +static inline void nvme_start_failfast_work(struct nvme_ctrl *ctrl) {
> > +	if (!ctrl->opts || ctrl->opts->fast_io_fail_tmo == 0)
> 
> the timeout disabled should be -1, 0 is to fail I/O right away.
> 
> Maybe also add "off" string as a -1 equivalent.
> 

[VG]  It's ok to me. I'll add "off" string option.

> > diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> > index 2a6c819..3a39714 100644
> > --- a/drivers/nvme/host/fabrics.c
> > +++ b/drivers/nvme/host/fabrics.c
> > @@ -631,6 +633,7 @@ static int nvmf_parse_options(struct
> nvmf_ctrl_options *opts,
> >   	opts->reconnect_delay = NVMF_DEF_RECONNECT_DELAY;
> >   	opts->kato = NVME_DEFAULT_KATO;
> >   	opts->duplicate_connect = false;
> > +	opts->fast_io_fail_tmo = NVMF_DEF_FAIL_FAST_TMO;
> >   	opts->hdr_digest = false;
> >   	opts->data_digest = false;
> >   	opts->tos = -1; /* < 0 == use transport default */ @@ -751,6
> > +754,19 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
> >   				pr_warn("ctrl_loss_tmo < 0 will reconnect
> forever\n");
> >   			ctrl_loss_tmo = token;
> >   			break;
> > +		case NVMF_OPT_FAIL_FAST_TMO:
> > +			if (match_int(args, &token)) {
> > +				ret = -EINVAL;
> > +				goto out;
> > +			}
> > +
> > +			if (token)
> > +				pr_warn("fail_fast_tmo != 0, I/O will fail on"
> > +					" reconnect controller after %d sec\n",
> > +					token);
> > +
> > +			opts->fast_io_fail_tmo = token;
> > +			break;
> 
> Again, -1 should disable, not 0.

[VG] You're right.

> 
> >   		case NVMF_OPT_HOSTNQN:
> >   			if (opts->host) {
> >   				pr_err("hostnqn already user-assigned: %s\n", @@ -
> 881,11 +897,14
> > @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
> >   		opts->nr_poll_queues = 0;
> >   		opts->duplicate_connect = true;
> >   	}
> > -	if (ctrl_loss_tmo < 0)
> > +	if (ctrl_loss_tmo < 0) {
> >   		opts->max_reconnects = -1;
> > -	else
> > -		opts->max_reconnects = DIV_ROUND_UP(ctrl_loss_tmo,
> > -						opts->reconnect_delay);
> > +	} else {
> > +		if (ctrl_loss_tmo < opts->fast_io_fail_tmo)
> > +			pr_warn("failfast tmo (%d) larger than controller "
> > +				"loss tmo (%d)\n",
> > +				opts->fast_io_fail_tmo, ctrl_loss_tmo);
> > +	}
> 
> it can be that ctrl_loss_tmo=-1 which means reconnect forever.

 [VG] OK

> > diff --git a/drivers/nvme/host/multipath.c
> > b/drivers/nvme/host/multipath.c index 54603bd..4b66504 100644
> > --- a/drivers/nvme/host/multipath.c
> > +++ b/drivers/nvme/host/multipath.c
> > @@ -278,9 +278,13 @@ static bool nvme_available_path(struct
> > nvme_ns_head *head)
> >
> >   	list_for_each_entry_rcu(ns, &head->list, siblings) {
> >   		switch (ns->ctrl->state) {
> > +		case NVME_CTRL_CONNECTING:
> > +			if (test_bit(NVME_CTRL_FAILFAST_EXPIRED,
> > +				     &ns->ctrl->flags))
> > +				break;
> > +			/* fallthru */
> 
> I absolutely don't see why this needs to happen. this setting is on a controller,
> so this absolutely should not affect the mpath device node. so nak on this.

[VG] In case of multipath is enabled we need announce the path as unavailable,
 otherwise the command will requeue but not aborted.



More information about the Linux-nvme mailing list