lockdep WARNING at blktests block/011

Shinichiro Kawasaki shinichiro.kawasaki at wdc.com
Wed Oct 5 01:31:08 PDT 2022


On Oct 04, 2022 / 21:23, Shin'ichiro Kawasaki wrote:
> On Oct 04, 2022 / 20:10, Tetsuo Handa wrote:
> > On 2022/10/04 19:44, Shinichiro Kawasaki wrote:
> > > Any comment on this patch will be appreciated. If this action approach is ok,
> > > I'll post as a formal patch for review.
> > 
> > I don't want you to make such change.
> > 
> > I saw a case where real deadlock was hidden by lockdep_set_novalidate_class().
> > 
> >   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=62ebaf2f9261cd2367ae928a39343fcdbfe9f877
> >   https://groups.google.com/g/syzkaller-bugs/c/Uj9LqEUCwac/m/BhdTjWhNAQAJ
> > 
> > In general, this kind of deadlock possibility had better be addressed by bringing
> > problematic locks out of cancel{,_delayed}_work_sync() section.
> > 
> >   https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a91b750fd6629354460282bbf5146c01b05c4859
> 
> Thanks for the comment. Then, I think the question is how to move the
> blk_sync_queue() call out of the ctrl->namespaces_rwsem critical section in
> nvme_sync_io_queues():
> 
> void nvme_sync_io_queues(struct nvme_ctrl *ctrl)
> {
> 	struct nvme_ns *ns;
> 
> 	down_read(&ctrl->namespaces_rwsem);
> 	list_for_each_entry(ns, &ctrl->namespaces, list)
> 		blk_sync_queue(ns->queue);
> 	up_read(&ctrl->namespaces_rwsem);
> }
> 
> I'm not yet sure how we can do it. I guess it is needed to copy the
> ctrl->namespaces list to a temporary array to refer out of the critical section.
> Also need to keep kref of each ns not to free. Will try to implement tomorrow.

Getting suggestion by Johaness, I created a patch below and confirmed it avoids
the lockdep splat. It moves elements of ctrl->namespaces to a list head on a
stack, so that blk_sync_queue() can be called out of ctrl->namespaces_rwsem
critical section. Then the list elements are put back to ctrl->namespaces. I
think this temporal list move is fine since the controller is in process for
reset, stop or tear down. This point needs confirmation with NVME experts.

Keith, what do you think?

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 66446f1e06c..6ed68bb6008 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5120,11 +5120,27 @@ EXPORT_SYMBOL_GPL(nvme_start_admin_queue);
 void nvme_sync_io_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	LIST_HEAD(splice);
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	/*
+	 * blk_sync_queues() call in ctrl->snamespaces_rwsem critical section
+	 * triggers deadlock warning by lockdep since cancel_work_sync() in
+	 * blk_sync_queue() waits for nvme_timeout() work completion which may
+	 * lock the ctrl->snamespaces_rwsem. To avoid the deadlock possibility,
+	 * call blk_sync_queues() out of the critical section by moving the
+         * ctrl->namespaces list elements to the stack list head temporally.
+	 */
+
+	down_write(&ctrl->namespaces_rwsem);
+	list_splice_init(&ctrl->namespaces, &splice);
+	up_write(&ctrl->namespaces_rwsem);
+
+	list_for_each_entry(ns, &splice, list)
 		blk_sync_queue(ns->queue);
-	up_read(&ctrl->namespaces_rwsem);
+
+	down_write(&ctrl->namespaces_rwsem);
+	list_splice(&splice, &ctrl->namespaces);
+	up_write(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_sync_io_queues);
 


-- 
Shin'ichiro Kawasaki


More information about the Linux-nvme mailing list