[PATCH 3/3] nvme: 'nvme disconnect' hangs after remapping namespaces

Hannes Reinecke hare at suse.de
Tue Sep 10 01:23:03 PDT 2024


On 9/10/24 09:57, Sagi Grimberg wrote:
> 
> 
> 
> On 09/09/2024 10:19, Hannes Reinecke wrote:
>> During repetitive namespace map and unmap operations on the target
>> (disabling the namespace, changing the UUID, enabling it again)
>> the initial scan will hang as the target will be returning
>> PATH_ERROR and the I/O is constantly retried:
>>
>> [<0>] folio_wait_bit_common+0x12a/0x310
>> [<0>] filemap_read_folio+0x97/0xd0
>> [<0>] do_read_cache_folio+0x108/0x390
>> [<0>] read_part_sector+0x31/0xa0
>> [<0>] read_lba+0xc5/0x160
>> [<0>] efi_partition+0xd9/0x8f0
>> [<0>] bdev_disk_changed+0x23d/0x6d0
>> [<0>] blkdev_get_whole+0x78/0xc0
>> [<0>] bdev_open+0x2c6/0x3b0
>> [<0>] bdev_file_open_by_dev+0xcb/0x120
>> [<0>] disk_scan_partitions+0x5d/0x100
>> [<0>] device_add_disk+0x402/0x420
>> [<0>] nvme_mpath_set_live+0x4f/0x1f0 [nvme_core]
>> [<0>] nvme_mpath_add_disk+0x107/0x120 [nvme_core]
>> [<0>] nvme_alloc_ns+0xac6/0xe60 [nvme_core]
>> [<0>] nvme_scan_ns+0x2dd/0x3e0 [nvme_core]
>> [<0>] nvme_scan_work+0x1a3/0x490 [nvme_core]
>>
>> Calling 'nvme disconnect' on controllers with these namespaces
>> will hang as the disconnect operation tries to flush scan_work:
>>
>> [<0>] __flush_work+0x389/0x4b0
>> [<0>] nvme_remove_namespaces+0x4b/0x130 [nvme_core]
>> [<0>] nvme_do_delete_ctrl+0x72/0x90 [nvme_core]
>> [<0>] nvme_delete_ctrl_sync+0x2e/0x40 [nvme_core]
>> [<0>] nvme_sysfs_delete+0x35/0x40 [nvme_core]
>> [<0>] kernfs_fop_write_iter+0x13d/0x1b0
>> [<0>] vfs_write+0x404/0x510
>>
>> before the namespaces are removed, and the controller state
>> DELETING_NOIO (which would abort any pending I/O) is set only
>> afterwards.
>>
>> This patch calls 'nvme_kick_requeue_lists()' when entering
>> DELETING state for a controller to ensure all pending I/O
>> is flushed, and also disables failover for any commands which
>> are completed with an error afterwards, breaking the infinite
>> retry loop.
>>
>> Signed-off-by: Hannes Reinecke <hare at kernel.org>
>> ---
>>   drivers/nvme/host/core.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 651073280f6f..142babce1963 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -381,6 +381,8 @@ enum nvme_disposition {
>>   static inline enum nvme_disposition nvme_decide_disposition(struct 
>> request *req)
>>   {
>> +    struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
>> +
>>       if (likely(nvme_req(req)->status == 0))
>>           return COMPLETE;
>> @@ -393,6 +395,8 @@ static inline enum nvme_disposition 
>> nvme_decide_disposition(struct request *req)
>>           return AUTHENTICATE;
>>       if (req->cmd_flags & REQ_NVME_MPATH) {
>> +        if (nvme_ctrl_state(ctrl) == NVME_CTRL_DELETING)
>> +            return COMPLETE;
> 
> This looks wrong. What if I disconnect from one path and there are other
> eligible paths?

Yes, correct. We should check if there are paths left, and failover
if we still have paths, but complete the requests if this is the last 
path and we had a path error:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 651073280f6f..73b21c01b165 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -394,8 +394,17 @@ static inline enum nvme_disposition 
nvme_decide_disposition(struct request *req)

         if (req->cmd_flags & REQ_NVME_MPATH) {
                 if (nvme_is_path_error(nvme_req(req)->status) ||
-                   blk_queue_dying(req->q))
+                   blk_queue_dying(req->q)) {
+                       struct nvme_ns *ns = req->q->queuedata;
+                       /*
+                        * Always complete if this is the last path
+                        * and the controller is deleted.
+                        */
+                       if (list_is_singular(&ns->head->list) &&
+                           nvme_ctrl_state(ns->ctrl) == NVME_CTRL_DELETING)
+                               return COMPLETE;
                         return FAILOVER;
+               }
         } else {
                 if (blk_queue_dying(req->q))
                         return COMPLETE;

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich




More information about the Linux-nvme mailing list