[RFC PATCH] arm64: Cache HW update of Access Flag support
Will Deacon
will at kernel.org
Fri Jan 6 09:09:50 PST 2023
On Fri, Jan 06, 2023 at 01:38:25PM -0300, Gabriel Krisman Bertazi wrote:
> Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
> in the hypervisor. In fact, ARM documentation mentions some feature
> registers are not supposed to be accessed frequently by the OS, and
> therefore should be emulated for guests [1].
>
> Commit 0388f9c74330 ("arm64: mm: Implement
> arch_wants_old_prefaulted_pte()") introduced a read of this register in
> the page fault path. But, even when the feature of setting faultaround
> pages with the old flag is disabled for a given cpu, we are still paying
> the cost of checking the register on every pagefault. This results in an
> explosion of vmexit events in KVM guests, which directly impacts the
> performance of virtualized workloads. For instance, running kernbench
> yields a 15% increase in system time solely due to the increased vmexit
> cycles.
>
> This patch avoids the extra cost by caching the register value after the
> first read. It should be safe to do so, since this register mustn't
> change for a specific processor.
>
> As a side note, we can't rely on the usual cpucap for this bit, because
> for some reason that I'm missing, it is always enabled for aarch64, even if
> the functionality is not supported by the processor.
>
> [1] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Learn%20the%20Architecture/Armv8-A%20virtualization.pdf?revision=a765a7df-1a00-434d-b241-357bfda2dd31
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman at suse.de>
> ---
> arch/arm64/include/asm/cpufeature.h | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 03d1c9d7af82..599c7a22155b 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -859,14 +859,24 @@ static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
> /* Check whether hardware update of the Access flag is supported */
> static inline bool cpu_has_hw_af(void)
> {
> - u64 mmfr1;
> + static unsigned int has_af = -1U;
>
> if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
> return false;
>
> - mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);
> - return cpuid_feature_extract_unsigned_field(mmfr1,
> + /*
> + * Cache the register to avoid repeatedly reading it,
> + * which is an emulated operation on KVM guests.
> + *
> + * Also do not rely on cpucap, since this bit is always set
> + * there, regardless of actual status. See has_hw_dbm() for
> + * details.
> + */
> + if (unlikely(has_af == -1U))
> + has_af = cpuid_feature_extract_unsigned_field(
> + read_cpuid(ID_AA64MMFR1_EL1),
> ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> + return has_af;
The intention here was to read the value for the _current_ CPU, since it
might not be the same across the system when big.MISTAKE gets involved.
However, since this is really just a performance optimisation and I
think that the access flag tends to be uniformly supported in practice
anyway, the best bet is probably just to read the sanitised version of
the field using read_sanitised_ftr_reg().
Can you give that a shot, please?
Will
More information about the linux-arm-kernel
mailing list