[PATCH v9] nvme-fabrics: reject I/O to offline device

Tue Nov 17 03:39:32 EST 2020

>>> @@ -151,12 +151,16 @@ EXPORT_SYMBOL_GPL(nvme_try_sched_reset);
>>>    static void nvme_failfast_work(struct work_struct *work) {
>>>           struct nvme_ctrl *ctrl = container_of(to_delayed_work(work),
>>>                           struct nvme_ctrl, failfast_work);
>>> +       struct nvme_ns *ns;
>>>
>>> -       if (ctrl->state != NVME_CTRL_CONNECTING)
>>> -               return;
>>> -
>>> -
>>> -       set_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
>>> +       down_read(&ctrl->namespaces_rwsem);
>>> +       list_for_each_entry(ns, &ctrl->namespaces, list) {
>>> +               if (ctrl->state != NVME_CTRL_LIVE ||
>>> +                   (ns->ana_state != NVME_ANA_OPTIMIZED &&
>>> +                    ns->ana_state != NVME_ANA_NONOPTIMIZED))
>>> +                       set_bit(NVME_NS_FAILFAST_EXPIRED, &ns->flags);
>>> +       }
>>> +       up_read(&ctrl->namespaces_rwsem);
>>>           dev_info(ctrl->device, "failfast expired\n");
>>>
>>> ...and we could leave the failfast worker running even after the 
>>> controller
>>> transitioned to LIVE.
>>> Cf the attached patch for details.
>>>
>>> Cheers,
>>>
>>> Hannes
>>> -- 
>>
>> I'm not sure what makes sense to move the FAILFAST_EXPIRED bit into 
>> the namespace,
>> Because the failfast mechanism characterizes the controller as a whole.
>>
> Oh, yes, I'm aware of that. But the problem here is with multipath; how 
> do we handle the situation where all controllers have the 
> 'failfast_expired' bit set?
> Should I/O be terminated (which I think it should, given that failfast 
> is supposed to terminate the I/O)?
> Or should I/O continue to run (as it does with your original patch)?

I do agree that fast_io_fail_tmo _is_ a controller attribute and should
remain as such.

I do see what is your point Hannes, however I also think it's
problematic that the host may fail arbitrary I/O if the controller
happens to enter ANA inaccessible state (or have state transition
timeout) for a period that happens to be longer than what the user
happen to set (without communicating any of this to the controller).

IFF we want to address this (I'm still not sure), we probably want
to activate failfast timeout in ANA state transition (and clear it
when we exit it). Then we can modify nvme_available_path() to take
NVME_CTRL_FAILFAST_EXPIRED into account.

Anyways, I think that this can be an incremental patch because it
doesn't change the behavior today with respect to ANA states (or
transition between them) e.g. queue the outstanding I/O.