[PATCH 0/4] KVM: arm64: vgic: Locking fixes
Jean-Philippe Brucker
jean-philippe at linaro.org
Wed May 24 05:40:42 PDT 2023
On Fri, May 19, 2023 at 09:46:45AM +0100, Marc Zyngier wrote:
> On Thu, 18 May 2023 11:09:14 +0100,
> Jean-Philippe Brucker <jean-philippe at linaro.org> wrote:
> >
> > Another fun locking puzzle, between the new config_lock and srcu.
> > Patch 1 attempts to fix it, and the other patches fix simpler issues.
>
> Thanks for that and for your excellent description of the problems.
>
> > I got these lockdep reports while running KVM QEMU on a TCG QEMU, but it
> > can also be triggered by running the vgic_irq kselftest on TCG QEMU.
> > Now, with the fix and lockdep enabled, vgic_irq hangs but I believe it's
> > an unrelated weirdness: if I introduce a separate lockdep warning for
> > some made up locks, then the test passes again. So I'm sending this out
> > now for discussion, and will investigate that one later.
>
> I've taken these patches for a spin, and I cannot reproduce this hang,
> though I'm running on actual HW and not QEMU. It would be really
> annoying if lockdep actively introduced issues... :-/
>
> Any chance you could dig into this as you have a good reproducer? I'll
> try to setup a TGC environment on my end as well.
I'm not sure this is a fixable problem, didn't find anything obviously
wrong. It's just that when running with lockdep and KASAN (forgot about
that one, sorry) on an emulated platform, some tests can't make progress.
In this config I can boot a guest without problem, but this particular
test gets stuck when trying to inject 32 interrupts. What I see is:
1. KVM prepares to enter the guest, spends a lot of time dealing with the
vGIC with interrupts disabled.
2. Just before entering the guest it sees ISR_EL1 set (the timer interrupt
is pending), so cancels the return to guest, reprograms the timer and
goto 1.
For this particular test, trying to inject several interrupts
simultaneously, I think kvm_vgic_flush_hwstate() needs to take and release
lots of locks which takes forever with lockdep. I measure about 4ms inside
that function, which corresponds to the timer period at HZ_250.
Previously, I guess getting a lockdep warning would disable lockdep and
allow the test to make progress.
So maybe the vGIC could still be optimized, or maybe this isn't worth
fixing, we could just say that this setup is too slow to reliably run KVM
and leave it at that.
Thanks,
Jean
More information about the linux-arm-kernel
mailing list