[PATCH v8] nvme-fabrics: reject I/O to offline device

Sun Sep 27 07:48:44 EDT 2020

> On 9/18/20 11:39 PM, Sagi Grimberg wrote:
> 
> On 9/6/20 11:21 PM, Victor Gladkov wrote:
> > Commands get stuck while Host NVMe-oF controller is in reconnect
> > state. NVMe controller enters into reconnect state when it loses
> > connection with the target. It tries to reconnect every 10 seconds
> > (default) until successful reconnection or until reconnect time-out is
> > reached. The default reconnect time out is 10 minutes.
> >
> > Applications are expecting commands to complete with success or error
> > within a certain timeout (30 seconds by default).  The NVMe host is
> > enforcing that timeout while it is connected, never the less, during
> > reconnection, the timeout is not enforced and commands may get stuck
> > for a long period or even forever.
> >
> > To fix this long delay due to the default timeout we introduce new
> > session parameter "fast_io_fail_tmo". The timeout is measured in
> > seconds from the controller reconnect, any command beyond that timeout
> > is rejected. The new parameter value may be passed during 'connect'.
> > The default value of 0 means no timeout (similar to current behavior).
> 
> I think you mean here -1.

You right. It should be -1

> 
> >
> > We add a new controller NVME_CTRL_FAILFAST_EXPIRED and respective
> > delayed work that updates the NVME_CTRL_FAILFAST_EXPIRED flag.
> >
> > When the controller is entering the CONNECTING state, we schedule the
> > delayed_work based on failfast timeout value. If the transition is out
> > of CONNECTING, terminate delayed work item and ensure failfast_expired
> > is false. If delayed work item expires then set
> > "NVME_CTRL_FAILFAST_EXPIRED" flag to true.
> >
> > We also update nvmf_fail_nonready_command() and
> > nvme_available_path() functions with check the
> > "NVME_CTRL_FAILFAST_EXPIRED" controller flag.
> >
> >   /*
> > diff --git a/drivers/nvme/host/multipath.c
> > b/drivers/nvme/host/multipath.c index 54603bd..d8b7f45 100644
> > --- a/drivers/nvme/host/multipath.c
> > +++ b/drivers/nvme/host/multipath.c
> > @@ -278,9 +278,12 @@ static bool nvme_available_path(struct
> > nvme_ns_head *head)
> >
> >   	list_for_each_entry_rcu(ns, &head->list, siblings) {
> >   		switch (ns->ctrl->state) {
> > +		case NVME_CTRL_CONNECTING:
> > +			if (test_bit(NVME_CTRL_FAILFAST_EXPIRED,
> > +				     &ns->ctrl->flags))
> > +				break;
> >   		case NVME_CTRL_LIVE:
> >   		case NVME_CTRL_RESETTING:
> > -		case NVME_CTRL_CONNECTING:
> >   			/* fallthru */
> >   			return true;
> >   		default:
> 
> This is too subtle to not document.
> The parameter is a controller property, but here it will affect the mpath device
> node.
> 
> This is changing the behavior of "queue as long as we have an available path"
> to "queue until all our paths said to fail fast".
> 
> I guess that by default we will have the same behavior, and the behavior will
> change only if all the controller have failfast parameter tuned.
> 
> At the very least it is an important undocumented change that needs to be
> called in the change log.

The multipath may be stuck on reconnected controller even forever.
Moreover, all commands will be returned with error status,
but the path will not be switched.
And in this case, the presence of the additional path looks pointless.

I suggest use the failfast parameter for each path separately.
It can also serve as the priority of each path.

Regards,
Victor