KVM: Nested VGIC emulation leads to infinite IRQ exceptions

Volodymyr Babchuk Volodymyr_Babchuk at epam.com
Thu Oct 2 05:29:42 PDT 2025


Hi Marc,

Marc Zyngier <maz at kernel.org> writes:

> Please use the kvmarm mailing list for KVM related discussions (added
> for your convenience).

Oops, sorry. I missed that MAINTAINERS have 2 "L:" entries.

> On Tue, 30 Sep 2025 22:11:54 +0100,
> Volodymyr Babchuk <Volodymyr_Babchuk at epam.com> wrote:
>> 
>> 
>> Hi all,
>> 
>> We are trying to run Xen as KVM nested hypervisor (again!) and we have
>> encountered strange issue with GIC nested emulation. I am certain that
>> we'll dig to the root cause, but probably someone on the ML will save us
>> a couple of days of debugging by providing with some insights.
>> 
>> So, setup is following: QEMU 9.2 is running Xen 4.20 with KVM (latest
>> Linux master branch) as accelerator.
>
> 9.2 is an odd choice, specially as it doesn't have any NV support.
> ISTR that 10.1 is the first version to have some NV support, although
> without E2H0 enablement which I expect Xen requires.

Yep, I had to patch QEMU to enable E2H0 (among other things).

>
> Anyway, if you're already running something, then I expect you're
> patched QEMU to death to get there.

You are certainly correct.

[...]

>
> To help you further, I'd need a reproducer. I've asked you more than
> once to provide a way to reproduce your setup, but got no answer. The
> Debian package doesn't boot (it just messes up grub), and I don't have
> the time to learn how to deal with Xen from scratch.

The current setup is quite complex as it involves whole Android build,
so there is no easy setup to share reproducer.

> Until then, you'll have to apply some debugging by yourself.

This is what I and Dmytro are doing. And looks like I found the
problem. I added some more traces and here we go:

Xen wants to return back to vvCPU:

 qemu-system-aar-3378    [085] .....   246.770716: kvm_inject_nested_exception: IRQ: esr_el2 0x0 elr_el2: 0xffffffc0010e5508 spsr_el2: 0x024000c5 (M: EL1h) hcr_el2: 807c663f
 qemu-system-aar-3378    [085] .....   246.770716: kvm_get_timer_map: VCPU: 1, dv: 2, dp: 3, ev: 1, ep: 0
 qemu-system-aar-3378    [085] .....   246.770716: kvm_timer_update_irq: VCPU: 1, IRQ 28, level 0
 qemu-system-aar-3378    [085] .....   246.770716: vgic_update_irq_pending: VCPU: 1, IRQ 28, level: 0
 qemu-system-aar-3378    [085] .....   246.770717: kvm_timer_update_irq: VCPU: 1, IRQ 26, level 1


We have pending timer IRQ for Xen

 qemu-system-aar-3378    [085] .....   246.770717: vgic_update_irq_pending: VCPU: 1, IRQ 26, level: 1
 qemu-system-aar-3378    [085] d....   246.770717: kvm_timer_restore_state: CTL: 0x000000 CVAL:              0x0 arch_timer_ctx_index: 2
 qemu-system-aar-3378    [085] d....   246.770717: kvm_timer_restore_state: CTL: 0x000005 CVAL:   0x3e6c59a71a95 arch_timer_ctx_index: 3
 qemu-system-aar-3378    [085] .....   246.770717: kvm_timer_emulate: arch_timer_ctx_index: 1 (should_fire: 1)
 qemu-system-aar-3378    [085] .....   246.770718: kvm_timer_emulate: arch_timer_ctx_index: 0 (should_fire: 0)
 qemu-system-aar-3378    [085] d....   246.770719: vgic_update_irq_pending: VCPU: 1, IRQ 25, level: 0

But we also have bunch of ACTIVE interrupts which fill all available
LRs:

 qemu-system-aar-3378    [085] d....   246.770720: vgic_populate_lr: VCPU 1 lr 0 = 90a000000000004f
 qemu-system-aar-3378    [085] d....   246.770720: vgic_populate_lr: VCPU 1 lr 1 = 90a000000000004e
 qemu-system-aar-3378    [085] d....   246.770720: vgic_populate_lr: VCPU 1 lr 2 = d0a000000000004a
 qemu-system-aar-3378    [085] d....   246.770720: vgic_populate_lr: VCPU 1 lr 3 = d0a000000000004b

As all LR entries have ACTIVE bit set, read from IAR1 will produce 1023,
of course. Problem is that Xen itself can't deactivate these 4 IRQs as
they are directed to DomU, so DomU should active them first. But DomU
can't do this as it is never executed.

I am not sure what is the correct fix, but I see two options:

- Prioritize timer IRQs so they always present in LRs
- De-prioritize ACTIVE IRQs so they are inserted into LRs last.

Looks like the second one is better.


-- 
WBR, Volodymyr


More information about the linux-arm-kernel mailing list