[PATCH v2] RISC-V: KVM: Fix use-after-free in kvm_riscv_gstage_get_leaf()

Jiakai Xu xujiakai2025 at iscas.ac.cn
Thu Feb 26 01:08:31 PST 2026


Hi Anup,

Thanks for taking care of this and queuing the fix.

Best regards,
Jiakai


> -----Original Messages-----
> From: "Anup Patel" <anup at brainfault.org>
> Sent Time: 2026-02-26 16:26:54 (Thursday)
> To: "Jiakai Xu" <xujiakai2025 at iscas.ac.cn>
> Cc: linux-kernel at vger.kernel.org, linux-riscv at lists.infradead.org, kvm-riscv at lists.infradead.org, kvm at vger.kernel.org, "Alexandre Ghiti" <alex at ghiti.fr>, "Albert Ou" <aou at eecs.berkeley.edu>, "Palmer Dabbelt" <palmer at dabbelt.com>, "Paul Walmsley" <paul.walmsley at sifive.com>, "Atish Patra" <atish.patra at linux.dev>, "Jiakai Xu" <jiakaiPeanut at gmail.com>
> Subject: Re: [PATCH v2] RISC-V: KVM: Fix use-after-free in kvm_riscv_gstage_get_leaf()
> 
> On Mon, Feb 2, 2026 at 9:31 AM Jiakai Xu <xujiakai2025 at iscas.ac.cn> wrote:
> >
> > While fuzzing KVM on RISC-V, a use-after-free was observed in
> > kvm_riscv_gstage_get_leaf(),  where ptep_get() dereferences a
> > freed gstage page table page during gfn unmap.
> >
> > The crash manifests as:
> >   use-after-free in ptep_get include/linux/pgtable.h:340 [inline]
> >   use-after-free in kvm_riscv_gstage_get_leaf arch/riscv/kvm/gstage.c:89
> >   Call Trace:
> >     ptep_get include/linux/pgtable.h:340 [inline]
> >     kvm_riscv_gstage_get_leaf+0x2ea/0x358 arch/riscv/kvm/gstage.c:89
> >     kvm_riscv_gstage_unmap_range+0xf0/0x308 arch/riscv/kvm/gstage.c:265
> >     kvm_unmap_gfn_range+0x168/0x1fc arch/riscv/kvm/mmu.c:256
> >     kvm_mmu_unmap_gfn_range virt/kvm/kvm_main.c:724 [inline]
> >   page last free pid 808 tgid 808 stack trace:
> >     kvm_riscv_mmu_free_pgd+0x1b6/0x26a arch/riscv/kvm/mmu.c:457
> >     kvm_arch_flush_shadow_all+0x1a/0x24 arch/riscv/kvm/mmu.c:134
> >     kvm_flush_shadow_all virt/kvm/kvm_main.c:344 [inline]
> >
> > The UAF is caused by gstage page table walks running concurrently with
> > gstage pgd teardown. In particular, kvm_unmap_gfn_range() can traverse
> > gstage page tables while kvm_arch_flush_shadow_all() frees the pgd,
> > leading to use-after-free of page table pages.
> >
> > Fix the issue by serializing gstage unmap and pgd teardown with
> > kvm->mmu_lock. Holding mmu_lock ensures that gstage page tables
> > remain valid for the duration of unmap operations and prevents
> > concurrent frees.
> >
> > This matches existing RISC-V KVM usage of mmu_lock to protect gstage
> > map/unmap operations, e.g. kvm_riscv_mmu_iounmap.
> >
> > Fixes: dd82e35638d67f ("RISC-V: KVM: Factor-out g-stage page table management")
> > Signed-off-by: Jiakai Xu <xujiakai2025 at iscas.ac.cn>
> > Signed-off-by: Jiakai Xu <jiakaiPeanut at gmail.com>
> > ---
> > V1 -> V2: Removed kvm->mmu_lock in kvm_arch_flush_shadow_all().
> >
> >  arch/riscv/kvm/mmu.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> > index a1c3b2ec1dde5..1d71c1cb429ca 100644
> > --- a/arch/riscv/kvm/mmu.c
> > +++ b/arch/riscv/kvm/mmu.c
> > @@ -268,9 +268,11 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
> >         gstage.flags = 0;
> >         gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
> >         gstage.pgd = kvm->arch.pgd;
> > +       spin_lock(&kvm->mmu_lock);
> 
> Unconditionally locking mmu_lock over here cause following crash
> when powering off the KVM Guest.
> 
> [   88.985889] rcu: INFO: rcu_sched self-detected stall on CPU
> [   88.986721] rcu:     1-....: (5249 ticks this GP)
> idle=9184/1/0x4000000000000000 softirq=175/175 fqs=2223
> [   88.987816] rcu:     (t=5250 jiffies g=-791 q=31 ncpus=4)
> [   88.988993] CPU: 1 UID: 0 PID: 78 Comm: lkvm-static Not tainted
> 7.0.0-rc1-00002-gf242f3f353e6-dirty #3 PREEMPTLAZY
> [   88.989294] Hardware name: riscv-virtio,qemu (DT)
> [   88.989401] epc : queued_spin_lock_slowpath+0x54/0x474
> [   88.990144]  ra : do_raw_spin_lock+0xaa/0xd0
> [   88.990182] epc : ffffffff80bc7404 ra : ffffffff800893ea sp :
> ff200000003bb8d0
> [   88.990213]  gp : ffffffff81a32490 tp : ff60000002360c80 t0 :
> 616d6e755f6d766b
> [   88.990231]  t1 : 00000000fffff000 t2 : 70616d6e755f6d76 s0 :
> ff200000003bb8e0
> [   88.990286]  s1 : 00007fff7f600000 a0 : 0000000000000000 a1 :
> ff600000047b7000
> [   88.990304]  a2 : 00000000000000ff a3 : 0000000000000000 a4 :
> 0000000000000001
> [   88.990322]  a5 : ff600000047b7000 a6 : ffffffff81876808 a7 :
> 80000000fffff000
> [   88.990341]  s2 : ff600000047b7000 s3 : ff600000047b7a90 s4 :
> 0000000000000000
> [   88.990359]  s5 : 00007fff8f600000 s6 : 0000000000000001 s7 :
> 0000000000000001
> [   88.990378]  s8 : 0000000000000fff s9 : ff600000047b7488 s10:
> ffffffffffffffe0
> [   88.990396]  s11: ff60000003b8e050 t3 : ffffffff81a49eb7 t4 :
> ffffffff81a49eb7
> [   88.990433]  t5 : ffffffff81a49eb8 t6 : ff200000003bb728 ssp :
> 0000000000000000
> [   88.990451] status: 0000000200000120 badaddr: 0000000000000000
> cause: 8000000000000005
> [   88.990581] [<ffffffff80bc7404>] queued_spin_lock_slowpath+0x54/0x474
> [   88.990696] [<ffffffff80bc704e>] _raw_spin_lock+0x1a/0x24
> [   88.990917] [<ffffffff01ab4a58>] kvm_unmap_gfn_range+0x98/0xc8 [kvm]
> [   88.991415] [<ffffffff01aa5d22>]
> kvm_mmu_notifier_invalidate_range_start+0x17e/0x324 [kvm]
> [   88.991608] [<ffffffff8027f6da>]
> __mmu_notifier_invalidate_range_start+0x62/0x1bc
> [   88.991635] [<ffffffff8022d554>] unmap_vmas+0x120/0x134
> [   88.991654] [<ffffffff8024cc0a>] unmap_region+0x76/0xc0
> [   88.991675] [<ffffffff8024cd18>] vms_complete_munmap_vmas+0xc4/0x1c0
> [   88.991695] [<ffffffff8024dd5e>] do_vmi_align_munmap+0x152/0x178
> [   88.991716] [<ffffffff8024de24>] do_vmi_munmap+0xa0/0x148
> [   88.991736] [<ffffffff8024f4b2>] __vm_munmap+0xaa/0x140
> [   88.991757] [<ffffffff802389c8>] __riscv_sys_munmap+0x38/0x40
> [   88.991778] [<ffffffff80bbb048>] do_trap_ecall_u+0x260/0x45c
> [   88.991812] [<ffffffff80bc87a0>] handle_exception+0x168/0x174
> 
> Instead, we should only take mmu_lock if it was unlocked previously.
> 
> Something like this ...
> 
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 0b75eb2a1820..87c8f41482c5 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -245,6 +245,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>  {
>         struct kvm_gstage gstage;
> +       bool mmu_locked;
> 
>         if (!kvm->arch.pgd)
>                 return false;
> @@ -253,9 +254,12 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct
> kvm_gfn_range *range)
>         gstage.flags = 0;
>         gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
>         gstage.pgd = kvm->arch.pgd;
> +       mmu_locked = spin_trylock(&kvm->mmu_lock);
>         kvm_riscv_gstage_unmap_range(&gstage, range->start << PAGE_SHIFT,
>                                      (range->end - range->start) << PAGE_SHIFT,
>                                      range->may_block);
> +       if (mmu_locked)
> +               spin_unlock(&kvm->mmu_lock);
>         return false;
>  }
> 
> I have take care of this and queued as fix for Linux-7.0-rcX
> 
> Regards,
> Anup


More information about the linux-riscv mailing list