[PATCH 2/3] nvme-multipath: cannot disconnect controller on stuck partition scan
Hannes Reinecke
hare at suse.de
Tue Oct 15 07:56:08 PDT 2024
On 10/15/24 16:33, Keith Busch wrote:
> On Thu, Oct 10, 2024 at 10:57:23AM +0200, Hannes Reinecke wrote:
>> On 10/9/24 19:32, Keith Busch wrote:
>>> @@ -974,14 +991,14 @@ void nvme_mpath_shutdown_disk(struct nvme_ns_head *head)
>>> return;
>>> if (test_and_clear_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) {
>>> nvme_cdev_del(&head->cdev, &head->cdev_device);
>>> + /*
>>> + * requeue I/O after NVME_NSHEAD_DISK_LIVE has been cleared
>>> + * to allow multipath to fail all I/O.
>>> + */
>>> + synchronize_srcu(&head->srcu);
>>> + kblockd_schedule_work(&head->requeue_work);
>>> del_gendisk(head->disk);
>>> }
>>
>> I guess we need to split 'test_and_clear_bit()' into a 'test_bit()' when
>> testing for the condition and a 'clear_bit()' after del_gendisk().
>>
>> Otherwise we're having a race condition with nvme_mpath_set_live:
>>
>> if (!test_and_set_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) {
>> rc = device_add_disk(&head->subsys->dev, head->disk,
>> nvme_ns_attr_groups);
>>
>> which could sneak in from another controller just after we cleared the
>> NVME_NSHEAD_DISK_LIVE flag, causing device_add_disk() to fail as the
>> same name is already registered.
>> Or nvme_cdev_del() to display nice kernel warnings as the cdev was
>> not registered.
>
> Is that actually happening for you? I don't think it's supposed to
> happen because mpath_shutdwon_disk only happens if the head is detached
> from the subsystem, so no other thread should be able to possibly
> attempt to access that head's disk for a different controller path. That
> new controller should have to allocate an entirely new head.
Na, forget it. Not only doesn't it happen, but we need the
'test_and_set_bit()' at the start otherwise we can't abort I/O.
So all's fine with your patch, and it even survives my testcase.
(The first to do so even after several runs, I might add :-)
Care to send an official patch?
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
More information about the Linux-nvme
mailing list