[PATCH 2/3] nvme-multipath: cannot disconnect controller on stuck partition scan

Keith Busch kbusch at kernel.org
Tue Oct 15 07:33:19 PDT 2024


On Thu, Oct 10, 2024 at 10:57:23AM +0200, Hannes Reinecke wrote:
> On 10/9/24 19:32, Keith Busch wrote:
> > @@ -974,14 +991,14 @@ void nvme_mpath_shutdown_disk(struct nvme_ns_head *head)
> >   		return;
> >   	if (test_and_clear_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) {
> >   		nvme_cdev_del(&head->cdev, &head->cdev_device);
> > +		/*
> > +		 * requeue I/O after NVME_NSHEAD_DISK_LIVE has been cleared
> > +		 * to allow multipath to fail all I/O.
> > +		 */
> > +		synchronize_srcu(&head->srcu);
> > +		kblockd_schedule_work(&head->requeue_work);
> >   		del_gendisk(head->disk);
> >   	}
> 
> I guess we need to split 'test_and_clear_bit()' into a 'test_bit()' when
> testing for the condition and a 'clear_bit()' after del_gendisk().
> 
> Otherwise we're having a race condition with nvme_mpath_set_live:
> 
>         if (!test_and_set_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) {
>                 rc = device_add_disk(&head->subsys->dev, head->disk,
>                                      nvme_ns_attr_groups);
> 
> which could sneak in from another controller just after we cleared the
> NVME_NSHEAD_DISK_LIVE flag, causing device_add_disk() to fail as the
> same name is already registered.
> Or nvme_cdev_del() to display nice kernel warnings as the cdev was
> not registered.

Is that actually happening for you? I don't think it's supposed to
happen because mpath_shutdwon_disk only happens if the head is detached
from the subsystem, so no other thread should be able to possibly
attempt to access that head's disk for a different controller path. That
new controller should have to allocate an entirely new head.



More information about the Linux-nvme mailing list