[PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling
Marc Zyngier
maz at kernel.org
Tue Sep 6 03:00:09 PDT 2022
On Tue, 30 Aug 2022 20:41:18 +0100,
Oliver Upton <oliver.upton at linux.dev> wrote:
>
> Presently KVM only takes a read lock for stage 2 faults if it believes
> the fault can be fixed by relaxing permissions on a PTE (write unprotect
> for dirty logging). Otherwise, stage 2 faults grab the write lock, which
> predictably can pile up all the vCPUs in a sufficiently large VM.
>
> Like the TDP MMU for x86, this series loosens the locking around
> manipulations of the stage 2 page tables to allow parallel faults. RCU
> and atomics are exploited to safely build/destroy the stage 2 page
> tables in light of multiple software observers.
>
> Patches 1-2 are a cleanup to the way we collapse page tables, with the
> added benefit of narrowing the window of time a range of memory is
> unmapped.
>
> Patches 3-7 are minor cleanups and refactorings to the way KVM reads
> PTEs and traverses the stage 2 page tables to make it amenable to
> concurrent modification.
>
> Patches 8-9 use RCU to punt page table cleanup out of the vCPU fault
> path, which should also improve fault latency a bit.
>
> Patches 10-14 implement the meat of this series, extending the
> 'break-before-make' sequence with atomics to realize locking on PTEs.
> Effectively a cmpxchg() is used to 'break' a PTE, thereby serializing
> changes to a given PTE.
>
> Finally, patch 15 flips the switch on all the new code and starts
> grabbing the read side of the MMU lock for stage 2 faults.
>
> Applies to 6.0-rc3. Tested with KVM selftests and benchmarked with
> dirty_log_perf_test, scaling from 1 to 48 vCPUs with 4GB of memory per
> vCPU backed by THP.
>
> ./dirty_log_perf_test -s anonymous_thp -m 2 -b 4G -v ${NR_VCPUS}
>
> Time to dirty memory:
>
> +-------+---------+------------------+
> | vCPUs | 6.0-rc3 | 6.0-rc3 + series |
> +-------+---------+------------------+
> | 1 | 0.89s | 0.92s |
> | 2 | 1.13s | 1.18s |
> | 4 | 2.42s | 1.25s |
> | 8 | 5.03s | 1.36s |
> | 16 | 8.84s | 2.09s |
> | 32 | 19.60s | 4.47s |
> | 48 | 31.39s | 6.22s |
> +-------+---------+------------------+
>
> It is also worth mentioning that the time to populate memory has
> improved:
>
> +-------+---------+------------------+
> | vCPUs | 6.0-rc3 | 6.0-rc3 + series |
> +-------+---------+------------------+
> | 1 | 0.19s | 0.18s |
> | 2 | 0.25s | 0.21s |
> | 4 | 0.38s | 0.32s |
> | 8 | 0.64s | 0.40s |
> | 16 | 1.22s | 0.54s |
> | 32 | 2.50s | 1.03s |
> | 48 | 3.88s | 1.52s |
> +-------+---------+------------------+
>
> RFC: https://lore.kernel.org/kvmarm/20220415215901.1737897-1-oupton@google.com/
>
> RFC -> v1:
> - Factored out page table teardown from kvm_pgtable_stage2_map()
> - Use the RCU callback to tear down a subtree, instead of scheduling a
> callback for every individual table page.
> - Reorganized series to (hopefully) avoid intermediate breakage.
> - Dropped the use of page headers, instead stuffing KVM metadata into
> page::private directly
>
> Oliver Upton (14):
> KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
> KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
> KVM: arm64: Directly read owner id field in stage2_pte_is_counted()
> KVM: arm64: Read the PTE once per visit
> KVM: arm64: Split init and set for table PTE
> KVM: arm64: Return next table from map callbacks
> KVM: arm64: Document behavior of pgtable visitor callback
> KVM: arm64: Protect page table traversal with RCU
> KVM: arm64: Free removed stage-2 tables in RCU callback
> KVM: arm64: Atomically update stage 2 leaf attributes in parallel
> walks
> KVM: arm64: Make changes block->table to leaf PTEs parallel-aware
> KVM: arm64: Make leaf->leaf PTE changes parallel-aware
> KVM: arm64: Make table->block changes parallel-aware
> KVM: arm64: Handle stage-2 faults in parallel
>
> arch/arm64/include/asm/kvm_pgtable.h | 59 ++++-
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 7 +-
> arch/arm64/kvm/hyp/nvhe/setup.c | 4 +-
> arch/arm64/kvm/hyp/pgtable.c | 360 ++++++++++++++++----------
> arch/arm64/kvm/mmu.c | 65 +++--
> 5 files changed, 325 insertions(+), 170 deletions(-)
This fails to build on -rc4:
MODPOST vmlinux.symvers
MODINFO modules.builtin.modinfo
GEN modules.builtin
CC .vmlinux.export.o
LD .tmp_vmlinux.kallsyms1
ld: Unexpected GOT/PLT entries detected!
ld: Unexpected run-time procedure linkages detected!
ld: ID map text too big or misaligned
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_walk':
(.hyp.text+0xdc0c): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdc1c): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_get_leaf':
(.hyp.text+0xdc80): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdc90): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_map':
(.hyp.text+0xddb0): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xddc0): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_unmap':
(.hyp.text+0xde44): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xde50): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_destroy':
(.hyp.text+0xdf40): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdf50): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_map':
(.hyp.text+0xe16c): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe17c): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_set_owner':
(.hyp.text+0xe264): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe274): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_unmap':
(.hyp.text+0xe2d4): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe2e4): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_flush':
(.hyp.text+0xe5b4): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe5c4): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_destroy':
(.hyp.text+0xe6f0): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe700): undefined reference to `__kvm_nvhe___rcu_read_unlock'
make[3]: *** [Makefile:1169: vmlinux] Error 1
make[2]: *** [debian/rules:7: build-arch] Error 2
as this drags the RCU read-lock into EL2, and that's not going to
work... The following fixes it, but I wonder how you tested it.
Thanks,
M.
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index dc839db86a1a..adf170122daf 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -580,7 +580,7 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
*/
enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
-#if defined(__KVM_NVHE_HYPERVISOR___)
+#if defined(__KVM_NVHE_HYPERVISOR__)
static inline void kvm_pgtable_walk_begin(void) {}
static inline void kvm_pgtable_walk_end(void) {}
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list