[PATCH 1/3] KVM: arm64: Don't defer TLB invalidation when zapping table entries

Will Deacon will at kernel.org
Tue Mar 26 09:10:01 PDT 2024


On Tue, Mar 26, 2024 at 07:31:27AM -0700, Oliver Upton wrote:
> On Tue, Mar 26, 2024 at 01:34:17AM -0700, Oliver Upton wrote:
> > >  	}
> > >  
> > >  	mm_ops->put_page(ctx->ptep);
> > 
> > At least for the 'normal' MMU where we use RCU, this could be changed to
> > ->free_unlinked_table() which would defer the freeing of memory til
> > after the invalidation completes. But that still hoses pKVM's stage-2
> > MMU freeing in-place.
> 
> How about this (untested) diff? I _think_ it should address the
> invalidation issue while leaving the performance optimization in place
> for a 'normal' stage-2.
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 3fae5830f8d2..896fdc0d157d 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -872,14 +872,19 @@ static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t n
>  static bool stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt)
>  {
>  	/*
> -	 * If FEAT_TLBIRANGE is implemented, defer the individual
> -	 * TLB invalidations until the entire walk is finished, and
> -	 * then use the range-based TLBI instructions to do the
> -	 * invalidations. Condition deferred TLB invalidation on the
> -	 * system supporting FWB as the optimization is entirely
> -	 * pointless when the unmap walker needs to perform CMOs.
> +	 * It is possible to use FEAT_TLBIRANGE to do TLB invalidations at the
> +	 * end of the walk if certain conditions are met:
> +	 *
> +	 *  - The stage-2 is for a 'normal' VM (i.e. managed in the kernel
> +	 *    context). RCU provides sufficient guarantees to ensure that all
> +	 *    hardware and software references on the stage-2 page tables are
> +	 *    relinquished before freeing a table page.
> +	 *
> +	 *  - The system supports FEAT_FWB. Otherwise, KVM needs to do CMOs
> +	 *    during the page table table walk.
>  	 */
> -	return system_supports_tlb_range() && stage2_has_fwb(pgt);
> +	return !is_hyp_code() && system_supports_tlb_range() &&
> +		stage2_has_fwb(pgt);
>  }
>  
>  static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -1163,7 +1168,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  					       kvm_granule_size(ctx->level));
>  
>  	if (childp)
> -		mm_ops->put_page(childp);
> +		mm_ops->free_unlinked_table(childp, ctx->level);

Hmm, but doesn't the deferred TLBI still happen after the RCU critical
section?

I also think I found another bug, so I'll send a v2 with an extra patch...

Will



More information about the linux-arm-kernel mailing list