[PATCH v4 5/5] KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE

Mon Aug 11 05:40:49 PDT 2025

On Fri, 2025-05-16 at 10:52 +0100, Marc Zyngier wrote:
> On Mon, 12 May 2025 15:09:09 +0100,
> David Sauerwein <dssauerw at amazon.de> wrote:
> > 
> > Hi Jing,
> > 
> > After pulling this patch in via the v6.6.64 and v5.10.226 LTS releases, I see
> > NULL pointer dereferences in some guests. The dereference happens in different
> > parts of the kernel outside of the GIC driver (file systems, NVMe driver,
> > etc.). The issue only appears once every few hundred DISCARDs / guest boots.
> > Reverting the commit does fix the problem. I have seen multiple different guest
> > kernel versions (4.14, 5.15) and distributions exhibit this issue.
> 
> Where is the guest stack trace?

[  157.126835] Unable to handle kernel NULL pointer dereference at virtual address 000002e8
[  157.128248] Mem abort info:
[  157.128745]   Exception class = DABT (current EL), IL = 32 bits
[  157.129736]   SET = 0, FnV = 0
[  157.130266]   EA = 0, S1PTW = 0
[  157.130794] Data abort info:
[  157.131273]   ISV = 0, ISS = 0x00000004
[  157.131933]   CM = 0, WnR = 0
[  157.132451] user pgtable: 4k pages, 48-bit VAs, pgd = ffff8003f5d4d000
[  157.133556] [00000000000002e8] *pgd=0000000000000000
[  157.134414] Internal error: Oops: 96000004 [#1] SMP
[  157.135238] Modules linked in: sunrpc vfat fat dm_mirror dm_region_hash dm_log dm_mod crc32_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ena ptp pps_core
[  157.137452] Process kworker/0:1 (pid: 28, stack limit = 0xffff000009dd8000)
[  157.138741] CPU: 0 PID: 28 Comm: kworker/0:1 Not tainted 4.14.336-253.554.amzn2.aarch64 #1
[  157.140276] Hardware name: Amazon EC2 c7g.medium/, BIOS 1.0 11/1/2018
[  157.141502] Workqueue: xfs-reclaim/nvme0n1p1 xfs_reclaim_worker
[  157.142629] task: ffff8003f91de600 task.stack: ffff000009dd8000
[  157.143757] pc : xfs_perag_clear_reclaim_tag+0x4c/0x120
[  157.144747] lr : 0x0
[  157.145188] sp : ffff000009ddbb50 pstate : 80c00145
[  157.146118] x29: ffff000009ddbb50 x28: ffff8003f8052e00 
[  157.147126] x27: 0000000000000000 x26: 000000000007bc78 
[  157.148165] x25: ffff000008d36000 x24: ffff8003f8052e00 
[  157.149151] x23: ffff000009ddbb80 x22: 0000000000000000 
[  157.150139] x21: ffff00000843bd5c x20: 0000000000000000 
[  157.151135] x19: ffff8003f8052e00 x18: 0000000000000038 
[  157.152146] x17: 0000ffff8b577980 x16: ffff0000083255c8 
[  157.153132] x15: 0000000000000000 x14: ffff8003f8052e70 
[  157.154137] x13: ffff8003f8f85b60 x12: ffff8003f8f85d49 
[  157.155144] x11: ffff8003f8f85b88 x10: 0000000000000000 
[  157.156158] x9 : 0000000000000039 x8 : 0000000000000007 
[  157.157180] x7 : 000000000000003e x6 : 0000000000000038 
[  157.158199] x5 : 0000000000000000 x4 : 0000000000000000 
[  157.159205] x3 : 00000000000002e8 x2 : 0000000000000001 
[  157.160211] x1 : 0000000000000000 x0 : 00000000000002e8 
[  157.161209] Call trace:
[  157.161709]  xfs_perag_clear_reclaim_tag+0x4c/0x120
[  157.162644]  xfs_reclaim_inode+0x314/0x49c
[  157.163432]  xfs_reclaim_inodes_ag+0x1ac/0x2fc
[  157.164290]  xfs_reclaim_worker+0x4c/0x80
[  157.165065]  process_one_work+0x198/0x3e0
[  157.165841]  worker_thread+0x4c/0x458
[  157.166544]  kthread+0x138/0x13c
[  157.167172]  ret_from_fork+0x10/0x2c
[  157.167858] Code: d2800001 52800022 aa0303e0 2a0103fe (88fe7c62) 
[  157.169013] ---[ end trace 0a6955946156d7d5 ]---
[  157.169905] Kernel panic - not syncing: Fatal exception
[  157.170894] Kernel Offset: disabled
[  157.171583] CPU features: 0x2,28002238
[  157.172300] Memory Limit: none
[  157.172893] Rebooting in 30 seconds..

Hypervisor debug logs show a DISCARD command with a gpa which might
match x28/x24 in the above?

vgic_its_cmd_handle_discard gpa=438052e00 ite=00000000facc1299
event_id=0 device_id=32 ite_esz=8 vgic_its_base=10080000
vgic_its_check_event_id()=1

David, did we ever establish whether Ilias's patch from
https://lore.kernel.org/all/20250414111244.153528-1-ilstam@amazon.com/
makes the problem go away? It serializes the GIC state to userspace
like KVM does for most other devices, instead of doing the dubious
thing that the GIC specification *permits* and scribbling it to guest
memory.

If we look at the GIC specification, it says that behaviour is
UNPREDICTABLE in various cases where software writes to tables that the
GIC owns, or if those tables aren't zeroed when given to the GIC. It
would perhaps be useful to add a mode to QEMU which *enforces* that,
taking the affected pages out of the guest's memory map (and emulating
writes to parts of those pages which the guest *is* still allowed to
touch, etc.).

If this is indeed a guest bug, as I suspect, it should show up fairly
quickly in such a setup. Would be useful for catching other guest bugs
caused by this GIC feature too, like
https://lore.kernel.org/all/c69938cffd4002a93a95a396affaa945e0f69206.camel@infradead.org/

> > The issue looks like some kind of race. I think the guest re-uses the memory
> > allocated for the ITT before the hypervisor is actually done with the DISCARD
> > command, i.e. before it zeros the ITE. From what I can tell, the guest should
> > wait for the command to finish via its_wait_for_range_completion(). I tried
> > locking reads to its->cwriter in vgic_mmio_read_its_cwriter() and its->creadr
> > in vgic_mmio_read_its_creadr() with its->cmd_lock in the hypervisor kernel, but
> > that did not help. I also instrumented the guest kernel both via printk() and
> > trace events. In both cases the issue disappears once the instrumentation is in
> > place, so I'm not able to fully observe what is happening on the guest side.
> > 
> > Do you have an idea of what might cause the issue?
> 
> I'm a bit sceptical of this analysis, because KVM makes no use of the
> guest's owned memory outside of a save/restore event, and otherwise
> shadows everything.

Hypervisor live updates or live migration could trigger precisely that
save/restore event at any time, surely?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5069 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20250811/a8bc1e30/attachment.p7s>