[PATCH v2 29/45] KVM: arm64: GICv3: Set ICH_HCR_EL2.TDIR when interrupts overflow LR capacity

Mon Nov 24 05:40:35 PST 2025

On Mon, 24 Nov 2025 13:23:08 +0000,
Mark Brown <broonie at kernel.org> wrote:
> 
> [1  <text/plain; us-ascii (7bit)>]
> On Mon, Nov 24, 2025 at 01:06:29PM +0000, Marc Zyngier wrote:
> > Mark Brown <broonie at kernel.org> wrote:
> 
> > > FWIW I am seeing this on i.MX8MP (4xA53+GICv3):
> 
> > >   https://lava.sirena.org.uk/scheduler/job/2118713#L1044
> 
> > There are worrying errors way before that, in the VMID allocator init,
> > and I can't see what the GIC has to do with it. The issue Fuad
> > reported was at run time, not boot time. so this really doesn't align
> > with what you are seeing.
> 
> Yeah, I was just looking further and realising it was probably
> different - sorry about that.  I was checking what else was failing
> after seeing the qemu issue he was, all the platforms aren't booting one
> way or another.  FWIW with earlycon on the AM625 is showing similar
> issues to the i.MX8MP.

That's the initial warning:

	WARN_ON(NUM_USER_VMIDS - 1 <= num_possible_cpus());

The register state:

[  224.378174] pc : kvm_arm_vmid_alloc_init+0xa0/0xc0
[  224.382954] lr : kvm_arm_vmid_alloc_init+0x24/0xc0
[  224.387734] sp : ffff80008009bd40
[  224.391035] x29: ffff80008009bd40 x28: ffff0020209bd3c0 x27: ffffce5349159068
[  224.398162] x26: ffffce5349070118 x25: ffffce5348fb8eb8 x24: ffffce5349059128
[  224.405287] x23: 0000000000000109 x22: ffff0020208ea6c0 x21: 0000000000000004
[  224.412413] x20: ffffce5349c20b78 x19: 0000000000000000 x18: 00000000ffffffff
[  224.419538] x17: 00000000e9a61a0d x16: 00000000b1c06f2c x15: 00000000ffffffff
[  224.426663] x14: 0000000000000000 x13: 7374696220343420 x12: 3a74696d694c2065
[  224.433789] x11: ffffffffffe00000 x10: ffff00275c260000 x9 : ffffce5348048be0
[  224.440914] x8 : 00000000fffeffff x7 : ffff00275c260000 x6 : 80000000ffff0000
[  224.448039] x5 : 0000000000000048 x4 : 0000000000000110 x3 : ffffce5348fc1000
[  224.455164] x2 : 0000000000000100 x1 : 0000000000000100 x0 : 00000000000000ff

The disassembly:

ffff8000816ff220 <kvm_arm_vmid_alloc_init>:
ffff8000816ff220:       d503201f        nop
ffff8000816ff224:       d503201f        nop
ffff8000816ff228:       d503233f        paciasp
ffff8000816ff22c:       a9be7bfd        stp     x29, x30, [sp, #-32]!
ffff8000816ff230:       5280e400        mov     w0, #0x720                      // #1824
ffff8000816ff234:       910003fd        mov     x29, sp
ffff8000816ff238:       72a00300        movk    w0, #0x18, lsl #16
ffff8000816ff23c:       f9000bf3        str     x19, [sp, #16]
ffff8000816ff240:       97a4a61c        bl      ffff800080028ab0 <read_sanitised_ftr_reg>
ffff8000816ff244:       d3441c00        ubfx    x0, x0, #4, #4
ffff8000816ff248:       d0fffa02        adrp    x2, ffff800081641000 <rodata_full>
ffff8000816ff24c:       d0fffa03        adrp    x3, ffff800081641000 <rodata_full>
ffff8000816ff250:       f100081f        cmp     x0, #0x2
ffff8000816ff254:       52800201        mov     w1, #0x10                       // #16
ffff8000816ff258:       b940f044        ldr     w4, [x2, #240]
ffff8000816ff25c:       52800102        mov     w2, #0x8                        // #8
ffff8000816ff260:       d29fffe0        mov     x0, #0xffff                     // #65535
ffff8000816ff264:       1a820021        csel    w1, w1, w2, eq  // eq = none
ffff8000816ff268:       d2801fe2        mov     x2, #0xff                       // #255
ffff8000816ff26c:       b9005061        str     w1, [x3, #80]
ffff8000816ff270:       9a820000        csel    x0, x0, x2, eq  // eq = none
ffff8000816ff274:       d2802001        mov     x1, #0x100                      // #256
ffff8000816ff278:       d2a00022        mov     x2, #0x10000                    // #65536
ffff8000816ff27c:       9a810042        csel    x2, x2, x1, eq  // eq = none
ffff8000816ff280:       eb00009f        cmp     x4, x0
ffff8000816ff284:       540001e2        b.cs    ffff8000816ff2c0 <kvm_arm_vmid_alloc_init+0xa0>  // b.hs, b.nlast

That's the branch to the...

[...]

ffff8000816ff2c0:       d4210000        brk     #0x800

... BRK instruction.

So x0=255 and x4=272. 272 possible CPUs on a machine with only 16?
Bollocks.

Something is badly screwed in -next, and I'm not convinced it is KVM.

	d0f23ccf6ba9e cpumask: Cache num_possible_cpus()

is my current suspect.

	M.

-- 
Without deviation from the norm, progress is not possible.