[PATCH v3 12/15] KVM: arm64: Make block->table PTE changes parallel-aware

Oliver Upton oliver.upton at linux.dev
Wed Nov 2 16:03:27 PDT 2022


On Tue, Nov 01, 2022 at 07:22:49PM -0700, Ricardo Koller wrote:
> On Thu, Oct 27, 2022 at 10:22:47PM +0000, Oliver Upton wrote:
> > In order to service stage-2 faults in parallel, stage-2 table walkers
> > must take exclusive ownership of the PTE being worked on. An additional
> > requirement of the architecture is that software must perform a
> > 'break-before-make' operation when changing the block size used for
> > mapping memory.
> > 
> > Roll these two concepts together into helpers for performing a
> > 'break-before-make' sequence. Use a special PTE value to indicate a PTE
> > has been locked by a software walker. Additionally, use an atomic
> > compare-exchange to 'break' the PTE when the stage-2 page tables are
> > possibly shared with another software walker. Elide the DSB + TLBI if
> > the evicted PTE was invalid (and thus not subject to break-before-make).
> > 
> > All of the atomics do nothing for now, as the stage-2 walker isn't fully
> > ready to perform parallel walks.
> > 
> > Signed-off-by: Oliver Upton <oliver.upton at linux.dev>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 82 +++++++++++++++++++++++++++++++++---
> >  1 file changed, 76 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 4c579b3beabf..1df858c21b2e 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -49,6 +49,12 @@
> >  #define KVM_INVALID_PTE_OWNER_MASK	GENMASK(9, 2)
> >  #define KVM_MAX_OWNER_ID		1
> >  
> > +/*
> > + * Used to indicate a pte for which a 'break-before-make' sequence is in
> > + * progress.
> > + */
> > +#define KVM_INVALID_PTE_LOCKED		BIT(10)
> > +
> >  struct kvm_pgtable_walk_data {
> >  	struct kvm_pgtable_walker	*walker;
> >  
> > @@ -674,6 +680,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
> >  	return !!pte;
> >  }
> >  
> > +static bool stage2_pte_is_locked(kvm_pte_t pte)
> > +{
> > +	return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED);
> > +}
> > +
> >  static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
> >  {
> >  	if (!kvm_pgtable_walk_shared(ctx)) {
> > @@ -684,6 +695,64 @@ static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_
> >  	return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
> >  }
> >  
> > +/**
> > + * stage2_try_break_pte() - Invalidates a pte according to the
> > + *			    'break-before-make' requirements of the
> > + *			    architecture.
> > + *
> > + * @ctx: context of the visited pte.
> > + * @data: stage-2 map data
> > + *
> > + * Returns: true if the pte was successfully broken.
> > + *
> > + * If the removed pte was valid, performs the necessary serialization and TLB
> > + * invalidation for the old value. For counted ptes, drops the reference count
> > + * on the containing table page.
> > + */
> > +static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
> > +				 struct stage2_map_data *data)
> 
> Would it be possible to pass "kvm_s2_mmu *mmu" directly (instead of
> "stage2_map_data *data")? so this function can be reused by other
> walkers.

Sure, and I presume the ask is coming because you're layering eager page
splitting on top of this right? :-)

> Another option would be to stash "struct kvm_s2_mmu" in
> "struct kvm_pgtable_visit_ctx".

I don't think we'd want to do that. kvm_pgtable_visit_ctx is shared
amongst all walkers, including the hypervisor stage-1.

--
Thanks,
Oliver



More information about the linux-arm-kernel mailing list