blktests failures with v5.19-rc1
Bjorn Helgaas
helgaas at kernel.org
Wed Jun 15 12:47:27 PDT 2022
On Tue, Jun 14, 2022 at 04:00:45AM +0000, Shinichiro Kawasaki wrote:
> On Jun 14, 2022 / 02:38, Chaitanya Kulkarni wrote:
> > Shinichiro,
> >
> > On 6/13/22 19:23, Keith Busch wrote:
> > > On Tue, Jun 14, 2022 at 01:09:07AM +0000, Shinichiro Kawasaki wrote:
> > >> (CC+: linux-pci)
> > >> On Jun 11, 2022 / 16:34, Yi Zhang wrote:
> > >>> On Fri, Jun 10, 2022 at 10:49 PM Keith Busch <kbusch at kernel.org> wrote:
> > >>>>
> > >>>> And I am not even sure this is real. I don't know yet why
> > >>>> this is showing up only now, but this should fix it:
> > >>>
> > >>> Hi Keith
> > >>>
> > >>> Confirmed the WARNING issue was fixed with the change, here is
> > >>> the log:
> > >>
> > >> Thanks. I also confirmed that Keith's change to add
> > >> __ATTR_IGNORE_LOCKDEP to dev_attr_dev_rescan avoids the fix, on
> > >> v5.19-rc2.
> > >>
> > >> I took a closer look into this issue and found The deadlock
> > >> WARN can be recreated with following two commands:
> > >>
> > >> # echo 1 > /sys/bus/pci/devices/0000\:00\:09.0/rescan
> > >> # echo 1 > /sys/bus/pci/devices/0000\:00\:09.0/remove
> > >>
> > >> And it can be recreated with PCI devices other than NVME
> > >> controller, such as SCSI controller or VGA controller. Then
> > >> this is not a storage sub-system issue.
> > >>
> > >> I checked function call stacks of the two commands above. As
> > >> shown below, it looks like ABBA deadlock possibility is
> > >> detected and warned.
> > >
> > > Yeah, I was mistaken on this report, so my proposal to suppress
> > > the warning is definitely not right. If I run both 'echo'
> > > commands in parallel, I see it deadlock frequently. I'm not
> > > familiar enough with this code to any good ideas on how to fix,
> > > but I agree this is a generic pci issue.
> >
> > I think it is worth adding a testcase to blktests to make sure
> > these future releases will test this.
>
> Yeah, this WARN is confusing for us then it would be valuable to
> test by blktests not to repeat it. One point I wonder is: which test
> group the test case will it fall in? The nvme group could be the
> group to add, probably.
>
> Another point I wonder is other kernel test suite than blktests.
> Don't we have more appropriate test suite to check PCI device
> rescan/remove race ? Such a test sounds more like a PCI bus
> sub-system test than block/storage test.
I'm not aware of such a test, but it would be nice to have one.
Can you share your qemu config so I can reproduce this locally?
Thanks for finding and reporting this!
Bjorn
More information about the Linux-nvme
mailing list