[PATCH v6] nvme-fabrics: reject I/O to offline device
Hannes Reinecke
hare at suse.de
Thu Jul 2 10:48:09 EDT 2020
On 7/2/20 3:13 PM, Victor Gladkov wrote:
> Commands get stuck while Host NVMe controller (TCP or RDMA) is in
> reconnect state. NVMe controller enters into reconnect state when it
> loses connection with the target. It tries to reconnect every 10
> seconds (default) until successful reconnection or until reconnect
> time-out is reached. The default reconnect time out is 10 minutes.
>
> To fix this long delay due to the default timeout we introduce new
> session parameter "fast_io_fail_tmo". The timeout is measured in
> seconds from the controller reconnect, any command beyond that
> timeout is rejected. The new parameter value may be passed during
> 'connect'.
> The default value of 0 means no timeout (similar to current behavior).
>
> We add a new controller NVME_CTRL_FAILFAST_EXPIRED and respective
> delayed work that updates the NVME_CTRL_FAILFAST_EXPIRED flag.
>
> When the controller is entering the CONNECTING state, we schedule
> the delayed_work based on failfast timeout value. If the transition
> is out of CONNECTING, terminate delayed work item and ensure
> failfast_expired is false. If delayed work item expires then set
> "NVME_CTRL_FAILFAST_EXPIRED" flag to true.
>
> We also update nvmf_fail_nonready_command() and
> nvme_available_path() functions with check the
> "NVME_CTRL_FAILFAST_EXPIRED" controller flag.
>
> Signed-off-by: Victor Gladkov <victor.gladkov at kioxia.com>
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>
>
> ---
> Changes From V5:
>
> 1. Drop the "off" string option for fast_io_fail_tmo.
>
> Changes From V4:
>
> 1. Remove subsysnqn from dev_info just keep "failfast expired"
> in nvme_failfast_work().
> 2. Remove excess lock in nvme_failfast_work().
> 3. Change "timeout disabled" value to -1, '0' is to fail I/O right away now.
> 4. Add "off" string for fast_io_fail_tmo option as a -1 equivalent.
>
> Changes From V3:
>
> 1. BUG_FIX: Fix a bug in nvme_start_failfast_work() amd
> nvme_stop_failfast_work() when accessing ctrl->opts as it will fail
> for PCIe transport when is called nvme_change_ctrl_state()
> from nvme_reset_work(), since we don't set the ctrl->opts for
> PCIe transport.
> 2. Line wrap in nvme_start_failfast_work(), nvme_parse_option() and
> for macro NVMF_ALLOWED_OPTS definition.
> 3. Just like all the state change code add a switch for newly
> added state handling outside of the state machine in
> nvme_state_change().
> 4. nvme_available_path() add /* fallthru */ after if..break inside
> switch which is under list_for_each_entry_rcu().
> 5. Align newly added member nvmf_ctrl_options fast_io_fail_tmp.
> 6. Fix the tabs before if in nvme_available_path() and line wrap
> for the same.
> 7. In nvme_failfast_work() use state != NVME_CTRL_CONNECTING
> instead of == to get rid of the parentheses and avoid char > 80.
> 8. Get rid of the ";" at the end of the comment for @fast_io_fail_tmp.
> 9. Change the commit log style to match the one we have in the NVMe
> repo.
>
> Changes from V2:
>
> 1. Several coding style and small fixes.
> 2. Fix the comment for NVMF_DEF_FAIL_FAST_TMO.
> 3. Don't call functionality from the state machine.
> 4. Rename fast_fail_tmo -> fast_io_fail_tmo to match SCSI
> semantics.
>
> Changes from V1:
>
> 1. Add a new session parameter called "fast_fail_tmo". The timeout
> is measured in seconds from the controller reconnect, any command
> beyond that timeout is rejected. The new parameter value may be
> passed during 'connect', and its default value is 30 seconds.
> A value of 0 means no timeout (similar to current behavior).
> 2. Add a controller flag of "failfast_expired".
> 3. Add dedicated delayed_work that update the "failfast_expired"
> controller flag.
> 4. When entering CONNECTING, schedule the delayed_work based on
> failfast timeout value. If transition out of CONNECTING, terminate
> delayed work item and ensure failfast_expired is false.
> If delayed work item expires: set "failfast_expired" flag to true.
> 5. Update nvmf_fail_nonready_command() with check the
> "failfast_expired" controller flag.
>
> ---
> drivers/nvme/host/core.c | 53 ++++++++++++++++++++++++++++++++++++++++++-
> drivers/nvme/host/fabrics.c | 25 +++++++++++++++++---
> drivers/nvme/host/fabrics.h | 5 ++++
> drivers/nvme/host/multipath.c | 5 +++-
> drivers/nvme/host/nvme.h | 3 +++
> 5 files changed, 86 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index f3c037f..f704b20 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -137,6 +137,41 @@ int nvme_try_sched_reset(struct nvme_ctrl *ctrl)
> }
> EXPORT_SYMBOL_GPL(nvme_try_sched_reset);
>
> +static void nvme_failfast_work(struct work_struct *work)
> +{
> + struct nvme_ctrl *ctrl = container_of(to_delayed_work(work),
> + struct nvme_ctrl, failfast_work);
> +
> + if (ctrl->state != NVME_CTRL_CONNECTING)
> + return;
> +
> + set_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
> + dev_info(ctrl->device, "failfast expired\n");
> + nvme_kick_requeue_lists(ctrl);
> +}
> +
> +static inline void nvme_start_failfast_work(struct nvme_ctrl *ctrl)
> +{
> + if (!ctrl->opts || ctrl->opts->fast_io_fail_tmo == -1)
> + return;
> +
> + if (ctrl->opts->fast_io_fail_tmo == 0) {
> + nvme_failfast_work(&ctrl->failfast_work.work);
> + return;
> + }
> +
> + schedule_delayed_work(&ctrl->failfast_work,
> + ctrl->opts->fast_io_fail_tmo * HZ);
> +}
> +
No need for the 'if' clause here, schedule_delayed_work() will start the
function immediately if the timeout is 0.
> +static inline void nvme_stop_failfast_work(struct nvme_ctrl *ctrl)
> +{
> + if (!ctrl->opts || ctrl->opts->fast_io_fail_tmo == -1)
> + return;
> + cancel_delayed_work_sync(&ctrl->failfast_work);
> + clear_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
> +}
> +
I'd rather call 'cancel_delayed_work_sync()' prior to the check for
fast_io_fail_tmo, as otherwise we're not stopping the workqueue
if fast_io_fail_tmo changed while a workqueue element is pending.
(Unlikely to happen now, I know. But still.)
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
More information about the Linux-nvme
mailing list