[PATCH] nvme-pci: silence a lockdep complaint

Thu May 23 08:02:30 PDT 2024

On Thu, May 23, 2024 at 04:45:37PM +0300, Sagi Grimberg wrote:
> 
> 
> On 23/05/2024 16:19, Christoph Hellwig wrote:
> > On Thu, May 23, 2024 at 04:02:43PM +0300, Sagi Grimberg wrote:
> > > I just want to have good testing before we touch these areas that
> > > are all done in ctrl reset and error recovery which historically manifest
> > > most of the issues...
> > > 
> > > and all this because lockdep complains on false-positives...
> > I still haven't seen any proof that it is a false-positive.
> > 
> 
> The explanation is that the nvme-pci timeout handler will only disable the
> device when either
> it is able to transition the ctrl state to RESETTING or if the ctrl state is
> terminal, and in both
> cases the reset_work is not running concurrently...
> 
> It was also proposed moving the call to nvme_disable_ctrl to a different
> context (returning BLK_RESET_TIMER from
> the timeout handler) such that blk_sync_queue() does not depend on it. But
> Keith said it won't solve the issue.
> 
> Keith spent more time thinking about this, so I'll defer to him. I just want
> it to stop bothering Shinichiro and
> others running blktests regularly...

Lockdep is complaining about the namespaces_rwsem taken from both
timeout_work and reset_work.

The only time namespaces_rwsem is taken from timeout work is if
timeout_work successfully transitions controller state to RESETTING. But
the controller is already in the RESETTING state because reset_work
wouldn't be running here if we weren't. So timeout_work never locks the
rwsem if reset_work is already attempting to "cancel" that work.

Nevermind the fact that both are using the read lock, so it's not even a
problem if they both attempt to lock it. The only potential problem
could be if someone requests the write lock inbetween. At nearly the
same time, pciehp has to transition the state to DEAD in order for the
special "terminal" condition to apply for the timeout_work to attempt
some kind of recovery, but the removal doesn't take the write lock until
after the reset_work is flushed, so how would the write lock sneak in?

A potential interaction with scan_work while the timeout, reset, and
hotplug run concurrently might be the only part I'm not 100% sure about,
but I haven't found any concerning path so far.