[PATCH 4/4] nvme: set 'failfast_expired' in nvme_remove_namespaces()

Sagi Grimberg sagi at grimberg.me
Fri Sep 6 00:38:43 PDT 2024


Please describe what the patch fixes in the title, not what it does.


On 06/09/2024 10:18, Hannes Reinecke wrote:
> nvme_remove_namespaces() is only called when the controller is
> being removed. If there is a scan process still pending and
> the I/O from that process cannot make progress (eg if all paths
> are in ANA state 'inaccessible') we cannot disconnect the
> controller as the 'nvme disconnect' process will hang in
> flush_work(&ctrl->scan_work).

Please include the hang process stack.

>
> This patch sets the 'failfast_expired' bit for the controller
> to cause all pending I/O to be failed, and the disconnect process
> to complete.

How did you reproduce it? trigger namespace scanning and disconnect-all
in a loop? Can we get a blktest for it?

>
> Signed-off-by: Hannes Reinecke <hare at kernel.org>
> ---
>   drivers/nvme/host/core.c | 7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 651073280f6f..b968b672dcf8 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4222,6 +4222,13 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
>   	 */
>   	nvme_mpath_clear_ctrl_paths(ctrl);
>   
> +	/*
> +	 * Mark the controller as 'failfast' to ensure all pending I/O
> +	 * is killed.
> +	 */
> +	set_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);

What about nvme_stop_failfast_work() ?

> +	nvme_kick_requeue_lists(ctrl);
> +
>   	/*
>   	 * Unquiesce io queues so any pending IO won't hang, especially
>   	 * those submitted from scan work




More information about the Linux-nvme mailing list