[PATCH 2/2] KVM: arm64: Avoid soft lockups due to I-cache maintenance

Wed Sep 20 20:28:06 PDT 2023

On 9/20/23 18:01, Oliver Upton wrote:
> Gavin reports of soft lockups on his Ampere Altra Max machine when
> backing KVM guests with hugetlb pages. Upon further investigation, it
> was found that the system is unable to keep up with parallel I-cache
> invalidations done by KVM's stage-2 fault handler.
> 
> This is ultimately an implementation problem. I-cache maintenance
> instructions are available at EL0, so nothing stops a malicious
> userspace from hammering a system with CMOs and cause it to fall over.
> "Fixing" this problem in KVM is nothing more than slapping a bandage
> over a much deeper problem.
> 
> Anyway, the kernel already has a heuristic for limiting TLB
> invalidations to avoid soft lockups. Reuse that logic to limit I-cache
> CMOs done by KVM to map executable pages on systems without FEAT_DIC.
> While at it, restructure __invalidate_icache_guest_page() to improve
> readability and squeeze our new condition into the existing branching
> structure.
> 
> Link: https://lore.kernel.org/kvmarm/20230904072826.1468907-1-gshan@redhat.com/
> Signed-off-by: Oliver Upton <oliver.upton at linux.dev>
> ---
>   arch/arm64/include/asm/kvm_mmu.h | 37 ++++++++++++++++++++++++++------
>   1 file changed, 31 insertions(+), 6 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan at redhat.com>
Tested-by: Gavin Shan <gshan at redhat.com>

> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 96a80e8f6226..a425ecdd7be0 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -224,16 +224,41 @@ static inline void __clean_dcache_guest_page(void *va, size_t size)
>   	kvm_flush_dcache_to_poc(va, size);
>   }
>   
> +static inline size_t __invalidate_icache_max_range(void)
> +{
> +	u8 iminline;
> +	u64 ctr;
> +
> +	asm volatile(ALTERNATIVE_CB("movz %0, #0\n"
> +				    "movk %0, #0, lsl #16\n"
> +				    "movk %0, #0, lsl #32\n"
> +				    "movk %0, #0, lsl #48\n",
> +				    ARM64_ALWAYS_SYSTEM,
> +				    kvm_compute_final_ctr_el0)
> +		     : "=r" (ctr));
> +
> +	iminline = SYS_FIELD_GET(CTR_EL0, IminLine, ctr) + 2;
> +	return MAX_DVM_OPS << iminline;
> +}
> +
>   static inline void __invalidate_icache_guest_page(void *va, size_t size)
>   {
> -	if (icache_is_aliasing()) {
> -		/* any kind of VIPT cache */
> +	/*
> +	 * VPIPT I-cache maintenance must be done from EL2. See comment in the
> +	 * nVHE flavor of __kvm_tlb_flush_vmid_ipa().
> +	 */
> +	if (icache_is_vpipt() && read_sysreg(CurrentEL) != CurrentEL_EL2)
> +		return;
> +
> +	/*
> +	 * Blow the whole I-cache if it is aliasing (i.e. VIPT) or the
> +	 * invalidation range exceeds our arbitrary limit on invadations by
> +	 * cache line.
> +	 */
> +	if (icache_is_aliasing() || size > __invalidate_icache_max_range())
>   		icache_inval_all_pou();
> -	} else if (read_sysreg(CurrentEL) != CurrentEL_EL1 ||
> -		   !icache_is_vpipt()) {
> -		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
> +	else
>   		icache_inval_pou((unsigned long)va, (unsigned long)va + size);
> -	}
>   }
>   
>   void kvm_set_way_flush(struct kvm_vcpu *vcpu);