[PATCH v2 for-5.8-rc 1/6] nvme: fix possible deadlock when I/O is blocked

Anthony Iliopoulos ailiop at suse.com
Tue Jul 7 06:57:35 EDT 2020


On Tue, Jun 23, 2020 at 11:54:31PM -0700, Sagi Grimberg wrote:
> 
> > > Revert fab7772bfbcf ("nvme-multipath: revalidate nvme_ns_head gendisk
> > > in nvme_validate_ns")
> > > 
> > > When adding a new namespace to the head disk (via nvme_mpath_set_live)
> > > we will see partition scan which triggers I/O on the mpath device node.
> > > This process will usually be triggered from the scan_work which holds
> > > the scan_lock. If I/O blocks (if we got ana change currently have only
> > > available paths but none are accessible) this can deadlock on the head
> > > disk bd_mutex as both partition scan I/O takes it, and head disk revalidation
> > > takes it to check for resize (also triggered from scan_work on a different
> > > path). See trace [1].
> > > 
> > > This is no longer needed since commit cb224c3af4df ("nvme: Convert to
> > > use set_capacity_revalidate_and_notify") which already updates resize
> > > info without unnecessarily revalidating the disk.
> > 
> > Did you look if any other revalidation gets skipped this way? E.g.
> > LBA size change?
> > 
> >  From purely the size POV this looks good, though.
> 
> The only reason the revalidate was added was to update on a size change,
> which is redundant.

The revalidation is still required there in order to update the blockdev
size when multipath is enabled. With this revert, the behavior regressed
back to before fab7772bfbcf, so the size isn't updated under multipath
when the device is in use (e.g. mounted), it will only be effectively
updated next time it is mounted as a side-effect of blkdev_open on first
opener. This prevents online resizing of filesystems.

> And if you look at revalidate, it only calls fops->revalidate_disk
> (which is a noop in the mpath device) and checks the size change.

The mpath gendisk doesn't have a revalidate_disk fop indeed, but call to
check_disk_size_change was needed to update its size. Otherwise, the
size change isn't reflected externally to userspace before next mount.
cb224c3af4df doesn't fix this as it skips revalidate and thus also skips
updating the size.

The issue is that under CONFIG_NVME_MULTIPATH=y the ns->disk becomes a
hidden gendisk (see nvme_set_disk_name) and thus doesn't have a backing
bdev, while ns->head->disk becomes the "visible" one to userspace (with
a backing bdev), and therefore needs its bdev->bd_inode->i_size updated
when resized.

The following patch seems to be working, although I'm not sufficiently
familiar with the code to tell if this won't cause any other deadlocks
with some confidence:

>From 607221431d99503fa7ff0091bf6277d946ef9930 Mon Sep 17 00:00:00 2001
From: Anthony Iliopoulos <ailiop at suse.com>
Date: Tue, 7 Jul 2020 12:42:54 +0200
Subject: nvme: explicitly update mpath disk capacity on revalidation

Commit 3b4b19721ec652 ("nvme: fix possible deadlock when I/O is
blocked") reverted multipath head disk revalidation due to deadlocks
caused by holding the bd_mutex during revalidate.

Updating the multipath disk blockdev size is still required though for
userspace to be able to observe any resizing while the device is
mounted. Directly update the bdev inode size to avoid unnecessarily
holding the bdev->bd_mutex.

Fixes: 3b4b19721ec652 ("nvme: fix possible deadlock when I/O is
blocked")

Signed-off-by: Anthony Iliopoulos <ailiop at suse.com>
---
 drivers/nvme/host/core.c |  1 +
 drivers/nvme/host/nvme.h | 13 +++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8410d03b940d74..add040168e67e2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1980,6 +1980,7 @@ static int __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	if (ns->head->disk) {
 		nvme_update_disk_info(ns->head->disk, ns, id);
 		blk_queue_stack_limits(ns->head->disk->queue, ns->queue);
+		nvme_mpath_update_disk_size(ns->head->disk);
 	}
 #endif
 	return 0;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 2ef8d501e2a87c..1de3f9b827aa56 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -604,6 +604,16 @@ static inline void nvme_trace_bio_complete(struct request *req,
 		trace_block_bio_complete(ns->head->disk->queue, req->bio);
 }

+static inline void nvme_mpath_update_disk_size(struct gendisk *disk)
+{
+	struct block_device *bdev = bdget_disk(disk, 0);
+
+	if (bdev) {
+		bd_set_size(bdev, get_capacity(disk) << SECTOR_SHIFT);
+		bdput(bdev);
+	}
+}
+
 extern struct device_attribute dev_attr_ana_grpid;
 extern struct device_attribute dev_attr_ana_state;
 extern struct device_attribute subsys_attr_iopolicy;
@@ -679,6 +689,9 @@ static inline void nvme_mpath_wait_freeze(struct nvme_subsystem *subsys)
 static inline void nvme_mpath_start_freeze(struct nvme_subsystem *subsys)
 {
 }
+static inline void nvme_mpath_update_disk_size(struct gendisk *disk)
+{
+}
 #endif /* CONFIG_NVME_MULTIPATH */

 #ifdef CONFIG_NVM
--
2.27.0



More information about the Linux-nvme mailing list