[PATCHv2] nvme: use srcu for iterating namespace list

Shinichiro Kawasaki shinichiro.kawasaki at wdc.com
Sun May 26 17:33:35 PDT 2024


On May 24, 2024 / 18:29, Sagi Grimberg wrote:
> 
> 
> On 24/05/2024 7:41, Shinichiro Kawasaki wrote:
> > On May 23, 2024 / 10:20, Keith Busch wrote:
> > > From: Keith Busch <kbusch at kernel.org>
> > > 
> > > The nvme pci driver synchronizes with all the namespace queues during a
> > > reset to ensure that there's no pending timeout work.
> > > 
> > > Meanwhile the timeout work potentially iterates those same namespaces to
> > > freeze their queues.
> > > 
> > > Each of those namespace iterations use the same read lock. If a write
> > > lock should somehow get between the synchronize and freeze steps, then
> > > forward progress is deadlocked.
> > > 
> > > We had been relying on the nvme controller state machine to ensure the
> > > reset work wouldn't conflict with timeout work. That guarantee may be a
> > > bit fragile to rely on, so iterate the namespace lists without taking a
> > > lock to fix potential circular locks, as reported by lockdep.
> > > 
> > > Link: https://lore.kernel.org/all/20220930001943.zdbvolc3gkekfmcv@shindev/
> > > Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki at wdc.com>
> > > Signed-off-by: Keith Busch <kbusch at kernel.org>
> > Keith, thank you very much for the patch. Sagi, Christoph, thank you for the
> > discussion.
> > 
> > I confirmed this patch avoids the lockdep WARN that I reported. I also ran my
> > test set with the patch and observed no regression. Looks good.
> > 
> > Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki at wdc.com>
> 
> I'm assuming that all blktests ran with this change? including fabrics and
> permutations?

Yes. To be precise, I ran with following conditions:

- target groups: block dm loop nbd nvme scsi srp ublk zbd
- NVMET_TRTYPES="loop rdma tcp fc"
- NVMET_BLKDEV_TYPES: default "device file"
- TEST_DEVS: QEMU nvme device

I observed some failures but they were all known failures.
Keith sent out the v3 patch. I will run my test set again.



More information about the Linux-nvme mailing list