[PATCH 2/3] nvme-multipath: cannot disconnect controller on stuck partition scan

Christoph Hellwig hch at lst.de
Mon Oct 7 23:43:34 PDT 2024


On Mon, Oct 07, 2024 at 12:19:23PM -0600, Keith Busch wrote:
> On Mon, Oct 07, 2024 at 12:01:33PM +0200, Hannes Reinecke wrote:
> > @@ -239,6 +239,13 @@ static bool nvme_path_is_disabled(struct nvme_ns *ns)
> >  {
> >  	enum nvme_ctrl_state state = nvme_ctrl_state(ns->ctrl);
> >  
> > +	/*
> > +	 * Skip deleted controllers for I/O from partition scan
> > +	 */
> > +	if (state == NVME_CTRL_DELETING &&
> > +	    mutex_is_locked(&ns->ctrl->scan_lock))
> > +		return true;
> 
> This feels off to me, using these seemingly unrelated dependencies to
> make these kinds of decisions.
> 
> We talked a couple weeks ago about suppressing the parition scanning
> during nvme scan_work. I know you said there was some reason it wouldn't
> work, but could you check the below? It seems okay to me.
> 
> ---
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 48e7a8906d012..82cb1eb3a773b 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -586,6 +586,12 @@ static void nvme_requeue_work(struct work_struct *work)
>  		container_of(work, struct nvme_ns_head, requeue_work);
>  	struct bio *bio, *next;
>  
> +	if (test_and_clear_bit(GD_SUPPRESS_PART_SCAN, &head->disk->state)) {
> +		mutex_lock(&head->disk->open_mutex);
> +		bdev_disk_changed(head->disk, false);
> +		mutex_unlock(&head->disk->open_mutex);
> +	}
> +

Rescan_work feels a little counter intuitive for the partition scanning.
I guess this should work because requeue_work is scheduled from the end
of nvme_mpath_set_live and gives us a context outside of scan_lock.

It'll need really good comments to explain this.




More information about the Linux-nvme mailing list