[PATCH v2 07/15] KVM: arm64: Use an opaque type for pteps
Sean Christopherson
seanjc at google.com
Wed Oct 19 16:17:43 PDT 2022
On Fri, Oct 07, 2022, Oliver Upton wrote:
> Use an opaque type for pteps and require visitors explicitly dereference
> the pointer before using. Protecting page table memory with RCU requires
> that KVM dereferences RCU-annotated pointers before using. However, RCU
> is not available for use in the nVHE hypervisor and the opaque type can
> be conditionally annotated with RCU for the stage-2 MMU.
>
> Call the type a 'pteref' to avoid a naming collision with raw pteps. No
> functional change intended.
>
> Signed-off-by: Oliver Upton <oliver.upton at linux.dev>
> ---
> arch/arm64/include/asm/kvm_pgtable.h | 9 ++++++++-
> arch/arm64/kvm/hyp/pgtable.c | 23 ++++++++++++-----------
> 2 files changed, 20 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index c33edcf36b5b..beb89eac155c 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -25,6 +25,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
>
> typedef u64 kvm_pte_t;
>
> +typedef kvm_pte_t *kvm_pteref_t;
> +
> +static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
> +{
> + return pteref;
Returning the pointer is unsafe (when it becomes RCU-protected). The full
dereference of the data needs to occur under RCU protection, not just the retrieval
of the pointer. E.g. this (straw man) would be broken
bool table = kvm_pte_table(ctx.old, level);
rcu_read_lock();
ptep = kvm_dereference_pteref(pteref, flags & KVM_PGTABLE_WALK_SHARED);
rcu_read_unlock();
if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
ctx.old = READ_ONCE(*ptep);
table = kvm_pte_table(ctx.old, level);
}
as the read of the entry pointed at by ptep could be to a page table that is freed
in an RCU callback.
The naming collision you are trying to avoid is a symptom of this bad pattern,
as there should never be "raw" pteps floating around, at least not in non-pKVM
contexts that utilize RCU.
> +}
> +
> #define KVM_PTE_VALID BIT(0)
>
> #define KVM_PTE_ADDR_MASK GENMASK(47, PAGE_SHIFT)
> @@ -170,7 +177,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
> struct kvm_pgtable {
> u32 ia_bits;
> u32 start_level;
> - kvm_pte_t *pgd;
> + kvm_pteref_t pgd;
> struct kvm_pgtable_mm_ops *mm_ops;
>
> /* Stage-2 only */
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 02c33fccb178..6b6e1ed7ee2f 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -175,13 +175,14 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
> }
>
> static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> - struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
> + struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level);
>
> static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> struct kvm_pgtable_mm_ops *mm_ops,
> - kvm_pte_t *ptep, u32 level)
> + kvm_pteref_t pteref, u32 level)
> {
> enum kvm_pgtable_walk_flags flags = data->walker->flags;
> + kvm_pte_t *ptep = kvm_dereference_pteref(pteref, false);
> struct kvm_pgtable_visit_ctx ctx = {
> .ptep = ptep,
> .old = READ_ONCE(*ptep),
This is where you want the protection to kick in, e.g.
typedef kvm_pte_t __rcu *kvm_ptep_t;
static inline kvm_pte_t kvm_read_pte(kvm_ptep_t ptep)
{
return READ_ONCE(*rcu_dereference(ptep));
}
.old = kvm_read_pte(ptep),
In other words, the pointer itself isn't that's protected, it's PTE that the
pointer points at that's protected.
rcu_dereference() has no overhead when CONFIG_PROVE_RCU=n, i.e. there's no reason
to "optimize" dereferences.
More information about the linux-arm-kernel
mailing list