[PATCH v2] RISC-V: KVM: Fix use-after-free in kvm_riscv_gstage_get_leaf()
Jiakai Xu
xujiakai2025 at iscas.ac.cn
Thu Feb 26 01:08:31 PST 2026
Hi Anup,
Thanks for taking care of this and queuing the fix.
Best regards,
Jiakai
> -----Original Messages-----
> From: "Anup Patel" <anup at brainfault.org>
> Sent Time: 2026-02-26 16:26:54 (Thursday)
> To: "Jiakai Xu" <xujiakai2025 at iscas.ac.cn>
> Cc: linux-kernel at vger.kernel.org, linux-riscv at lists.infradead.org, kvm-riscv at lists.infradead.org, kvm at vger.kernel.org, "Alexandre Ghiti" <alex at ghiti.fr>, "Albert Ou" <aou at eecs.berkeley.edu>, "Palmer Dabbelt" <palmer at dabbelt.com>, "Paul Walmsley" <paul.walmsley at sifive.com>, "Atish Patra" <atish.patra at linux.dev>, "Jiakai Xu" <jiakaiPeanut at gmail.com>
> Subject: Re: [PATCH v2] RISC-V: KVM: Fix use-after-free in kvm_riscv_gstage_get_leaf()
>
> On Mon, Feb 2, 2026 at 9:31 AM Jiakai Xu <xujiakai2025 at iscas.ac.cn> wrote:
> >
> > While fuzzing KVM on RISC-V, a use-after-free was observed in
> > kvm_riscv_gstage_get_leaf(), where ptep_get() dereferences a
> > freed gstage page table page during gfn unmap.
> >
> > The crash manifests as:
> > use-after-free in ptep_get include/linux/pgtable.h:340 [inline]
> > use-after-free in kvm_riscv_gstage_get_leaf arch/riscv/kvm/gstage.c:89
> > Call Trace:
> > ptep_get include/linux/pgtable.h:340 [inline]
> > kvm_riscv_gstage_get_leaf+0x2ea/0x358 arch/riscv/kvm/gstage.c:89
> > kvm_riscv_gstage_unmap_range+0xf0/0x308 arch/riscv/kvm/gstage.c:265
> > kvm_unmap_gfn_range+0x168/0x1fc arch/riscv/kvm/mmu.c:256
> > kvm_mmu_unmap_gfn_range virt/kvm/kvm_main.c:724 [inline]
> > page last free pid 808 tgid 808 stack trace:
> > kvm_riscv_mmu_free_pgd+0x1b6/0x26a arch/riscv/kvm/mmu.c:457
> > kvm_arch_flush_shadow_all+0x1a/0x24 arch/riscv/kvm/mmu.c:134
> > kvm_flush_shadow_all virt/kvm/kvm_main.c:344 [inline]
> >
> > The UAF is caused by gstage page table walks running concurrently with
> > gstage pgd teardown. In particular, kvm_unmap_gfn_range() can traverse
> > gstage page tables while kvm_arch_flush_shadow_all() frees the pgd,
> > leading to use-after-free of page table pages.
> >
> > Fix the issue by serializing gstage unmap and pgd teardown with
> > kvm->mmu_lock. Holding mmu_lock ensures that gstage page tables
> > remain valid for the duration of unmap operations and prevents
> > concurrent frees.
> >
> > This matches existing RISC-V KVM usage of mmu_lock to protect gstage
> > map/unmap operations, e.g. kvm_riscv_mmu_iounmap.
> >
> > Fixes: dd82e35638d67f ("RISC-V: KVM: Factor-out g-stage page table management")
> > Signed-off-by: Jiakai Xu <xujiakai2025 at iscas.ac.cn>
> > Signed-off-by: Jiakai Xu <jiakaiPeanut at gmail.com>
> > ---
> > V1 -> V2: Removed kvm->mmu_lock in kvm_arch_flush_shadow_all().
> >
> > arch/riscv/kvm/mmu.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> > index a1c3b2ec1dde5..1d71c1cb429ca 100644
> > --- a/arch/riscv/kvm/mmu.c
> > +++ b/arch/riscv/kvm/mmu.c
> > @@ -268,9 +268,11 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
> > gstage.flags = 0;
> > gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
> > gstage.pgd = kvm->arch.pgd;
> > + spin_lock(&kvm->mmu_lock);
>
> Unconditionally locking mmu_lock over here cause following crash
> when powering off the KVM Guest.
>
> [ 88.985889] rcu: INFO: rcu_sched self-detected stall on CPU
> [ 88.986721] rcu: 1-....: (5249 ticks this GP)
> idle=9184/1/0x4000000000000000 softirq=175/175 fqs=2223
> [ 88.987816] rcu: (t=5250 jiffies g=-791 q=31 ncpus=4)
> [ 88.988993] CPU: 1 UID: 0 PID: 78 Comm: lkvm-static Not tainted
> 7.0.0-rc1-00002-gf242f3f353e6-dirty #3 PREEMPTLAZY
> [ 88.989294] Hardware name: riscv-virtio,qemu (DT)
> [ 88.989401] epc : queued_spin_lock_slowpath+0x54/0x474
> [ 88.990144] ra : do_raw_spin_lock+0xaa/0xd0
> [ 88.990182] epc : ffffffff80bc7404 ra : ffffffff800893ea sp :
> ff200000003bb8d0
> [ 88.990213] gp : ffffffff81a32490 tp : ff60000002360c80 t0 :
> 616d6e755f6d766b
> [ 88.990231] t1 : 00000000fffff000 t2 : 70616d6e755f6d76 s0 :
> ff200000003bb8e0
> [ 88.990286] s1 : 00007fff7f600000 a0 : 0000000000000000 a1 :
> ff600000047b7000
> [ 88.990304] a2 : 00000000000000ff a3 : 0000000000000000 a4 :
> 0000000000000001
> [ 88.990322] a5 : ff600000047b7000 a6 : ffffffff81876808 a7 :
> 80000000fffff000
> [ 88.990341] s2 : ff600000047b7000 s3 : ff600000047b7a90 s4 :
> 0000000000000000
> [ 88.990359] s5 : 00007fff8f600000 s6 : 0000000000000001 s7 :
> 0000000000000001
> [ 88.990378] s8 : 0000000000000fff s9 : ff600000047b7488 s10:
> ffffffffffffffe0
> [ 88.990396] s11: ff60000003b8e050 t3 : ffffffff81a49eb7 t4 :
> ffffffff81a49eb7
> [ 88.990433] t5 : ffffffff81a49eb8 t6 : ff200000003bb728 ssp :
> 0000000000000000
> [ 88.990451] status: 0000000200000120 badaddr: 0000000000000000
> cause: 8000000000000005
> [ 88.990581] [<ffffffff80bc7404>] queued_spin_lock_slowpath+0x54/0x474
> [ 88.990696] [<ffffffff80bc704e>] _raw_spin_lock+0x1a/0x24
> [ 88.990917] [<ffffffff01ab4a58>] kvm_unmap_gfn_range+0x98/0xc8 [kvm]
> [ 88.991415] [<ffffffff01aa5d22>]
> kvm_mmu_notifier_invalidate_range_start+0x17e/0x324 [kvm]
> [ 88.991608] [<ffffffff8027f6da>]
> __mmu_notifier_invalidate_range_start+0x62/0x1bc
> [ 88.991635] [<ffffffff8022d554>] unmap_vmas+0x120/0x134
> [ 88.991654] [<ffffffff8024cc0a>] unmap_region+0x76/0xc0
> [ 88.991675] [<ffffffff8024cd18>] vms_complete_munmap_vmas+0xc4/0x1c0
> [ 88.991695] [<ffffffff8024dd5e>] do_vmi_align_munmap+0x152/0x178
> [ 88.991716] [<ffffffff8024de24>] do_vmi_munmap+0xa0/0x148
> [ 88.991736] [<ffffffff8024f4b2>] __vm_munmap+0xaa/0x140
> [ 88.991757] [<ffffffff802389c8>] __riscv_sys_munmap+0x38/0x40
> [ 88.991778] [<ffffffff80bbb048>] do_trap_ecall_u+0x260/0x45c
> [ 88.991812] [<ffffffff80bc87a0>] handle_exception+0x168/0x174
>
> Instead, we should only take mmu_lock if it was unlocked previously.
>
> Something like this ...
>
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 0b75eb2a1820..87c8f41482c5 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -245,6 +245,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
> {
> struct kvm_gstage gstage;
> + bool mmu_locked;
>
> if (!kvm->arch.pgd)
> return false;
> @@ -253,9 +254,12 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct
> kvm_gfn_range *range)
> gstage.flags = 0;
> gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
> gstage.pgd = kvm->arch.pgd;
> + mmu_locked = spin_trylock(&kvm->mmu_lock);
> kvm_riscv_gstage_unmap_range(&gstage, range->start << PAGE_SHIFT,
> (range->end - range->start) << PAGE_SHIFT,
> range->may_block);
> + if (mmu_locked)
> + spin_unlock(&kvm->mmu_lock);
> return false;
> }
>
> I have take care of this and queued as fix for Linux-7.0-rcX
>
> Regards,
> Anup
More information about the linux-riscv
mailing list