[RFC PATCH] arm64: Cache HW update of Access Flag support

Will Deacon will at kernel.org
Fri Jan 6 09:09:50 PST 2023


On Fri, Jan 06, 2023 at 01:38:25PM -0300, Gabriel Krisman Bertazi wrote:
> Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
> in the hypervisor.  In fact, ARM documentation mentions some feature
> registers are not supposed to be accessed frequently by the OS, and
> therefore should be emulated for guests [1].
> 
> Commit 0388f9c74330 ("arm64: mm: Implement
> arch_wants_old_prefaulted_pte()") introduced a read of this register in
> the page fault path.  But, even when the feature of setting faultaround
> pages with the old flag is disabled for a given cpu, we are still paying
> the cost of checking the register on every pagefault. This results in an
> explosion of vmexit events in KVM guests, which directly impacts the
> performance of virtualized workloads.  For instance, running kernbench
> yields a 15% increase in system time solely due to the increased vmexit
> cycles.
> 
> This patch avoids the extra cost by caching the register value after the
> first read.  It should be safe to do so, since this register mustn't
> change for a specific processor.
> 
> As a side note, we can't rely on the usual cpucap for this bit, because
> for some reason that I'm missing, it is always enabled for aarch64, even if
> the functionality is not supported by the processor.
> 
> [1] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Learn%20the%20Architecture/Armv8-A%20virtualization.pdf?revision=a765a7df-1a00-434d-b241-357bfda2dd31
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman at suse.de>
> ---
>  arch/arm64/include/asm/cpufeature.h | 16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 03d1c9d7af82..599c7a22155b 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -859,14 +859,24 @@ static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
>  /* Check whether hardware update of the Access flag is supported */
>  static inline bool cpu_has_hw_af(void)
>  {
> -	u64 mmfr1;
> +	static unsigned int has_af = -1U;
>  
>  	if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
>  		return false;
>  
> -	mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);
> -	return cpuid_feature_extract_unsigned_field(mmfr1,
> +	/*
> +	 * Cache the register to avoid repeatedly reading it,
> +	 * which is an emulated operation on KVM guests.
> +	 *
> +	 * Also do not rely on cpucap, since this bit is always set
> +	 * there, regardless of actual status. See has_hw_dbm() for
> +	 * details.
> +	 */
> +	if (unlikely(has_af == -1U))
> +		has_af = cpuid_feature_extract_unsigned_field(
> +						read_cpuid(ID_AA64MMFR1_EL1),
>  						ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> +	return has_af;

The intention here was to read the value for the _current_ CPU, since it
might not be the same across the system when big.MISTAKE gets involved.

However, since this is really just a performance optimisation and I
think that the access flag tends to be uniformly supported in practice
anyway, the best bet is probably just to read the sanitised version of
the field using read_sanitised_ftr_reg().

Can you give that a shot, please?

Will



More information about the linux-arm-kernel mailing list