[PATCH RFC] nvme: avoid deadlock warning in sync_io_queues() path

Shinichiro Kawasaki shinichiro.kawasaki at wdc.com
Wed Dec 13 23:27:31 PST 2023


Thanks for the comments.

On Dec 13, 2023 / 16:22, Sagi Grimberg wrote:
> 
> > > The commit c0feea594e05 ("workqueue: don't skip lockdep work dependency
> > > in cancel_work_sync()") restored a lockdep check and it triggered
> > > deadlock warning in sync_io_queues() path. This function locks
> > > namespaces_rwsem for read to traverse the namespace list and calls
> > > blk_sync_queue() for each namespace. blk_sync_queue() calls
> > > cancel_work_sync() to wait for nvme_timeout() completion which has paths
> > > to lock the namespaces_rwsem again for read. These two locks are both
> > > for read, but it can cause deadlock when another task has a lock for
> > > write. Hence, the deadlock warning was reported.
> > > 
> > > To avoid the deadlock warning, guard the namespace list by SRCU in place
> > > of the namespaces_rwsem. This avoids the deadlock by the two read locks
> > > since writes by other tasks do not stop the reads.
> > 
> > Based on the report this is triggered from the pci shutdown code path?

No. I added a debug print in nvme_shutdown(), but it was not printed when
the lockdep is printed. The test case accesses directly to PCI register to
disable bus master. It just makes I/Os timeout, but does not call
nvme_shutdown().

> > I think the problem is that there is a dev disable running together
> > with a reset (driven from the timeout handler). Perhaps this should
> > be addressed.

Could you elaborate the sentence above? I find nvme_timeout() calls both
nvme_dev_disable() and nvme_try_sched_reset() for reset. Do you mean
nvme_timeout() should not call both the two functions?

> > 
> > I don't think that making the namespaces lock an srcu helps us here.
> 
> Just read the correspondence on the report. If this is just lockdep
> false positive, then I don't think we need to change the namespaces
> rwsem.
> 
> Can't there be some annotation that we can do to teach lockdep that
> this one is a false positive?

I once suggested a simple trick by lockdep_off/on() in nvme_sync_io_queues(). It
suppresses the WARN, but was not welcomed [*].

  [*] https://lore.kernel.org/linux-block/63e14e00-e877-cb5a-a69a-ff44bed8b5a5@I-love.SAKURA.ne.jp/

Dynamic lockdep key can annotate locks if locks have multiple instances. I
tried it locally but it did not work. All three locks relevant to the lockdep
WARN reported have single instance. So far, I have no other idea to annotate the
lockdep.


More information about the Linux-nvme mailing list