[RFC PATCH 1/2] nvme-multipath: introduce delayed removal of the multipath head node

Christoph Hellwig hch at lst.de
Mon Apr 7 07:44:13 PDT 2025


> @@ -3690,6 +3690,10 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
>  	ratelimit_state_init(&head->rs_nuse, 5 * HZ, 1);
>  	ratelimit_set_flags(&head->rs_nuse, RATELIMIT_MSG_ON_RELEASE);
>  	kref_init(&head->ref);
> +#ifdef CONFIG_NVME_MULTIPATH
> +	if (ctrl->ops->flags & NVME_F_FABRICS)
> +		set_bit(NVME_NSHEAD_FABRICS, &head->flags);
> +#endif

We might want to make the flags unconditional or move this into a helper
to avoid the ifdef'ery if we keep the flag (see below).

> -	if (last_path)
> -		nvme_mpath_shutdown_disk(ns->head);
> +	nvme_mpath_shutdown_disk(ns->head);

I guess this function is where the shutdown naming came from, and it
probably was a bad idea even back then..

Maybe throw in an extra patch to rename it as well.

> +	/*
> +	 * For non-fabric controllers we support delayed removal of head disk
> +	 * node. If we reached up to here then it means that head disk is still
> +	 * alive and so we assume here that even if there's no path available
> +	 * maybe due to the transient link failure, we could queue up the IO
> +	 * and later when path becomes ready we re-submit queued IO.
> +	 */
> +	if (!(test_bit(NVME_NSHEAD_FABRICS, &head->flags)))
> +		return true;

Why is this conditional on fabrics or not?  The same rationale should
apply as much if not more for fabrics controllers.

Also no need for the set of braces around the test_bit() call.

>  }
>  
> +static void nvme_remove_head(struct nvme_ns_head *head)
> +{
> +	if (test_and_clear_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) {
> +		/*
> +		 * requeue I/O after NVME_NSHEAD_DISK_LIVE has been cleared
> +		 * to allow multipath to fail all I/O.
> +		 */

Full sentence are supposed to start with a capitalized word.

(yes, I saw this just move, but it's a good chance to fix it)




More information about the Linux-nvme mailing list