[PATCH RFC] nvme: avoid deadlock warning in sync_io_queues() path
Shinichiro Kawasaki
shinichiro.kawasaki at wdc.com
Wed Dec 13 23:27:31 PST 2023
Thanks for the comments.
On Dec 13, 2023 / 16:22, Sagi Grimberg wrote:
>
> > > The commit c0feea594e05 ("workqueue: don't skip lockdep work dependency
> > > in cancel_work_sync()") restored a lockdep check and it triggered
> > > deadlock warning in sync_io_queues() path. This function locks
> > > namespaces_rwsem for read to traverse the namespace list and calls
> > > blk_sync_queue() for each namespace. blk_sync_queue() calls
> > > cancel_work_sync() to wait for nvme_timeout() completion which has paths
> > > to lock the namespaces_rwsem again for read. These two locks are both
> > > for read, but it can cause deadlock when another task has a lock for
> > > write. Hence, the deadlock warning was reported.
> > >
> > > To avoid the deadlock warning, guard the namespace list by SRCU in place
> > > of the namespaces_rwsem. This avoids the deadlock by the two read locks
> > > since writes by other tasks do not stop the reads.
> >
> > Based on the report this is triggered from the pci shutdown code path?
No. I added a debug print in nvme_shutdown(), but it was not printed when
the lockdep is printed. The test case accesses directly to PCI register to
disable bus master. It just makes I/Os timeout, but does not call
nvme_shutdown().
> > I think the problem is that there is a dev disable running together
> > with a reset (driven from the timeout handler). Perhaps this should
> > be addressed.
Could you elaborate the sentence above? I find nvme_timeout() calls both
nvme_dev_disable() and nvme_try_sched_reset() for reset. Do you mean
nvme_timeout() should not call both the two functions?
> >
> > I don't think that making the namespaces lock an srcu helps us here.
>
> Just read the correspondence on the report. If this is just lockdep
> false positive, then I don't think we need to change the namespaces
> rwsem.
>
> Can't there be some annotation that we can do to teach lockdep that
> this one is a false positive?
I once suggested a simple trick by lockdep_off/on() in nvme_sync_io_queues(). It
suppresses the WARN, but was not welcomed [*].
[*] https://lore.kernel.org/linux-block/63e14e00-e877-cb5a-a69a-ff44bed8b5a5@I-love.SAKURA.ne.jp/
Dynamic lockdep key can annotate locks if locks have multiple instances. I
tried it locally but it did not work. All three locks relevant to the lockdep
WARN reported have single instance. So far, I have no other idea to annotate the
lockdep.
More information about the Linux-nvme
mailing list