[PATCH V4] nvme-fabrics: reject I/O to offline device
Hannes Reinecke
hare at suse.de
Thu Jul 2 10:35:43 EDT 2020
On 6/29/20 5:09 PM, Victor Gladkov wrote:
> I'd like to remind you that this improvement is awaiting for commit.
>
> Commands get stuck while Host NVMe controller (TCP or RDMA) is in
> reconnect state. NVMe controller enters into reconnect state when it
> loses connection with the target. It tries to reconnect every 10
> seconds (default) until successful reconnection or until reconnect
> time-out is reached. The default reconnect time out is 10 minutes.
>
> To fix this long delay due to the default timeout we introduce new
> session parameter "fast_io_fail_tmo". The timeout is measured in
> seconds from the controller reconnect, any command beyond that
> timeout is rejected. The new parameter value may be passed during
> 'connect'.
> The default value of 0 means no timeout (similar to current behavior).
>
> We add a new controller NVME_CTRL_FAILFAST_EXPIRED and respective
> delayed work that updates the NVME_CTRL_FAILFAST_EXPIRED flag.
>
> When the controller is entering the CONNECTING state, we schedule
> the delayed_work based on failfast timeout value. If the transition
> is out of CONNECTING, terminate delayed work item and ensure
> failfast_expired is false. If delayed work item expires then set
> "NVME_CTRL_FAILFAST_EXPIRED" flag to true.
>
> We also update nvmf_fail_nonready_command() and
> nvme_available_path() functions with check the
> "NVME_CTRL_FAILFAST_EXPIRED" controller flag.
>
> Signed-off-by: Victor Gladkov <victor.gladkov at kioxia.com>
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni at wdc.com>
>
> ---
> Changes From V3:
>
> 1. BUG_FIX: Fix a bug in nvme_start_failfast_work() amd
> nvme_stop_failfast_work() when accessing ctrl->opts as it will fail
> for PCIe transport when is called nvme_change_ctrl_state()
> from nvme_reset_work(), since we don't set the ctrl->opts for
> PCIe transport.
> 2. Line wrap in nvme_start_failfast_work(), nvme_parse_option() and
> for macro NVMF_ALLOWED_OPTS definition.
> 3. Just like all the state change code add a switch for newly
> added state handling outside of the state machine in
> nvme_state_change().
> 4. nvme_available_path() add /* fallthru */ after if..break inside
> switch which is under list_for_each_entry_rcu().
> 5. Align newly added member nvmf_ctrl_options fast_io_fail_tmp.
> 6. Fix the tabs before if in nvme_available_path() and line wrap
> for the same.
> 7. In nvme_failfast_work() use state != NVME_CTRL_CONNECTING
> instead of == to get rid of the parentheses and avoid char > 80.
> 8. Get rid of the ";" at the end of the comment for @fast_io_fail_tmp.
> 9. Change the commit log style to match the one we have in the NVMe
> repo.
>
> Changes from V2:
>
> 1. Several coding style and small fixes.
> 2. Fix the comment for NVMF_DEF_FAIL_FAST_TMO.
> 3. Don't call functionality from the state machine.
> 4. Rename fast_fail_tmo -> fast_io_fail_tmo to match SCSI
> semantics.
>
> Changes from V1:
>
> 1. Add a new session parameter called "fast_fail_tmo". The timeout
> is measured in seconds from the controller reconnect, any command
> beyond that timeout is rejected. The new parameter value may be
> passed during 'connect', and its default value is 30 seconds.
> A value of 0 means no timeout (similar to current behavior).
> 2. Add a controller flag of "failfast_expired".
> 3. Add dedicated delayed_work that update the "failfast_expired"
> controller flag.
> 4. When entering CONNECTING, schedule the delayed_work based on
> failfast timeout value. If transition out of CONNECTING, terminate
> delayed work item and ensure failfast_expired is false.
> If delayed work item expires: set "failfast_expired" flag to true.
> 5. Update nvmf_fail_nonready_command() with check the
> "failfast_expired" controller flag.
>
>
> ---
> drivers/nvme/host/core.c | 51 ++++++++++++++++++++++++++++++++++++++++++-
> drivers/nvme/host/fabrics.c | 30 ++++++++++++++++++++-----
> drivers/nvme/host/fabrics.h | 5 +++++
> drivers/nvme/host/multipath.c | 6 ++++-
> drivers/nvme/host/nvme.h | 3 +++
> 5 files changed, 88 insertions(+), 7 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare at suse.de>
Cheers,
Hanne
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
More information about the Linux-nvme
mailing list