[PATCH 4/4] nvme: set 'failfast_expired' in nvme_remove_namespaces()
Sagi Grimberg
sagi at grimberg.me
Fri Sep 6 00:38:43 PDT 2024
Please describe what the patch fixes in the title, not what it does.
On 06/09/2024 10:18, Hannes Reinecke wrote:
> nvme_remove_namespaces() is only called when the controller is
> being removed. If there is a scan process still pending and
> the I/O from that process cannot make progress (eg if all paths
> are in ANA state 'inaccessible') we cannot disconnect the
> controller as the 'nvme disconnect' process will hang in
> flush_work(&ctrl->scan_work).
Please include the hang process stack.
>
> This patch sets the 'failfast_expired' bit for the controller
> to cause all pending I/O to be failed, and the disconnect process
> to complete.
How did you reproduce it? trigger namespace scanning and disconnect-all
in a loop? Can we get a blktest for it?
>
> Signed-off-by: Hannes Reinecke <hare at kernel.org>
> ---
> drivers/nvme/host/core.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 651073280f6f..b968b672dcf8 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4222,6 +4222,13 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
> */
> nvme_mpath_clear_ctrl_paths(ctrl);
>
> + /*
> + * Mark the controller as 'failfast' to ensure all pending I/O
> + * is killed.
> + */
> + set_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
What about nvme_stop_failfast_work() ?
> + nvme_kick_requeue_lists(ctrl);
> +
> /*
> * Unquiesce io queues so any pending IO won't hang, especially
> * those submitted from scan work
More information about the Linux-nvme
mailing list