KVM: Nested VGIC emulation leads to infinite IRQ exceptions
Marc Zyngier
maz at kernel.org
Thu Oct 2 07:28:19 PDT 2025
On Thu, 02 Oct 2025 13:29:42 +0100,
Volodymyr Babchuk <Volodymyr_Babchuk at epam.com> wrote:
>
> Xen wants to return back to vvCPU:
>
> qemu-system-aar-3378 [085] ..... 246.770716: kvm_inject_nested_exception: IRQ: esr_el2 0x0 elr_el2: 0xffffffc0010e5508 spsr_el2: 0x024000c5 (M: EL1h) hcr_el2: 807c663f
> qemu-system-aar-3378 [085] ..... 246.770716: kvm_get_timer_map: VCPU: 1, dv: 2, dp: 3, ev: 1, ep: 0
> qemu-system-aar-3378 [085] ..... 246.770716: kvm_timer_update_irq: VCPU: 1, IRQ 28, level 0
> qemu-system-aar-3378 [085] ..... 246.770716: vgic_update_irq_pending: VCPU: 1, IRQ 28, level: 0
> qemu-system-aar-3378 [085] ..... 246.770717: kvm_timer_update_irq: VCPU: 1, IRQ 26, level 1
>
>
> We have pending timer IRQ for Xen
>
> qemu-system-aar-3378 [085] ..... 246.770717: vgic_update_irq_pending: VCPU: 1, IRQ 26, level: 1
> qemu-system-aar-3378 [085] d.... 246.770717: kvm_timer_restore_state: CTL: 0x000000 CVAL: 0x0 arch_timer_ctx_index: 2
> qemu-system-aar-3378 [085] d.... 246.770717: kvm_timer_restore_state: CTL: 0x000005 CVAL: 0x3e6c59a71a95 arch_timer_ctx_index: 3
> qemu-system-aar-3378 [085] ..... 246.770717: kvm_timer_emulate: arch_timer_ctx_index: 1 (should_fire: 1)
> qemu-system-aar-3378 [085] ..... 246.770718: kvm_timer_emulate: arch_timer_ctx_index: 0 (should_fire: 0)
> qemu-system-aar-3378 [085] d.... 246.770719: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0
>
> But we also have bunch of ACTIVE interrupts which fill all available
> LRs:
>
> qemu-system-aar-3378 [085] d.... 246.770720: vgic_populate_lr: VCPU 1 lr 0 = 90a000000000004f
> qemu-system-aar-3378 [085] d.... 246.770720: vgic_populate_lr: VCPU 1 lr 1 = 90a000000000004e
> qemu-system-aar-3378 [085] d.... 246.770720: vgic_populate_lr: VCPU 1 lr 2 = d0a000000000004a
> qemu-system-aar-3378 [085] d.... 246.770720: vgic_populate_lr: VCPU 1 lr 3 = d0a000000000004b
>
> As all LR entries have ACTIVE bit set, read from IAR1 will produce 1023,
> of course. Problem is that Xen itself can't deactivate these 4 IRQs as
> they are directed to DomU, so DomU should active them first. But DomU
> can't do this as it is never executed.
There is a flaw in your reasoning: if these are DomU (an L2 guest)
interrupts, why would they impact Xen itself, which is L1? At the
point of entering Xen, the HW LRs should only contain the virtual
interrupts that are targeting Xen, and nothing else (the DomU
interrupts being stored in the shadow LRs).
I can't see so far how we'd end-up in that situation, given that we do
a full context switch of the vgic context on each EL1/EL2 transition.
Unless you are actually acknowledging the DomU interrupts in Xen and
injecting them back into DomU? Which seems very odd as you don't have
the HW bit set, which I'd expect if that was the case...
> I am not sure what is the correct fix, but I see two options:
>
> - Prioritize timer IRQs so they always present in LRs
> - De-prioritize ACTIVE IRQs so they are inserted into LRs last.
>
> Looks like the second one is better.
That's indeed something missing in KVM (I have long waited until
someone would do it in my stead, but nobody seem to be bothered) but
it isn't clear, from what you are describing, that this is the actual
solution to your problem.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list