[PATCH 2/2] KVM: arm64: Destroy the stage-2 page-table periodically
Raghavendra Rao Ananta
rananta at google.com
Thu Aug 7 11:58:01 PDT 2025
Hi Oliver,
>
> Protected mode is affected by the same problem, potentially even worse
> due to the overheads of calling into EL2. Both protected and
> non-protected flows should use stage2_destroy_range().
>
I experimented with this (see diff below), and it looks like it takes
significantly longer to finish the destruction even for a very small
VM. For instance, it takes ~140 seconds on an Ampere Altra machine.
This is probably because we run cond_resched() for every breakup in
the entire sweep of the possible address range, 0 to ~(0ULL), even
though there are no actual mappings there, and we context switch out
more often.
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
+ static void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
+ {
+ u64 end = is_protected_kvm_enabled() ? ~(0ULL) : BIT(pgt->ia_bits);
+ u64 next, addr = 0;
+
+ do {
+ next = stage2_range_addr_end(addr, end);
+ KVM_PGT_FN(kvm_pgtable_stage2_destroy_range)(pgt, addr,
+ next - addr);
+
+ if (next != end)
+ cond_resched();
+ } while (addr = next, addr != end);
+
+
+ KVM_PGT_FN(kvm_pgtable_stage2_destroy_pgd)(pgt);
+ }
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -316,9 +316,13 @@ static int __pkvm_pgtable_stage2_unmap(struct
kvm_pgtable *pgt, u64 start, u64 e
return 0;
}
-void pkvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
+void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt, u64
addr, u64 size)
+{
+ __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
+}
+
+void pkvm_pgtable_stage2_destroy_pgd(struct kvm_pgtable *pgt)
+{
+}
Without cond_resched() in place, it takes about half the time.
I also tried moving cond_resched() to __pkvm_pgtable_stage2_unmap(),
as per the below diff, and calling pkvm_pgtable_stage2_destroy_range()
for the entire 0 to ~(1ULL) range (instead of breaking it up). Even
for a fully 4K mapped 128G VM, I see it taking ~65 seconds, which is
close to the baseline (no cond_resched()).
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -311,8 +311,11 @@ static int __pkvm_pgtable_stage2_unmap(struct
kvm_pgtable *pgt, u64 start, u64 e
return ret;
pkvm_mapping_remove(mapping, &pgt->pkvm_mappings);
kfree(mapping);
+ cond_resched();
}
Does it make sense to call cond_resched() only when we actually unmap pages?
Thank you.
Raghavendra
More information about the linux-arm-kernel
mailing list