[PATCH v3 00/38] arm64: Remove cpus_have_const_cap()
Mark Rutland
mark.rutland at arm.com
Tue Oct 10 03:31:01 PDT 2023
For historical reasons, cpus_have_const_cap() does more than its name
implies, and its current behaviour is more harmful than helpful. This
series removes cpus_have_const_cap(), removing some redundant code and
making the kernel more robust.
Currently, cpus_have_const_cap() is implemented as:
| static __always_inline bool cpus_have_const_cap(int num)
| {
| if (is_hyp_code())
| return cpus_have_final_cap(num);
| else if (system_capabilities_finalized())
| return __cpus_have_const_cap(num);
| else
| return cpus_have_cap(num);
| }
For hyp code this is safe and practically ideal. We finalize system
cpucaps and patch the relevant alternatives before KVM is initialized,
and so the alternative branch generated by cpus_have_final_cap() is
guaranteed to observe the finalized value of the cpucap.
For non-hyp code this is potentially unsafe and sub-optimal:
1) System cpucaps are detected on the boot CPU while secondary CPUs are
executing code. This leads to potential races around cpucaps being
detected, where the cpucaps can change at arbitrary points in time,
potentially in the middle of sequences which depend on them not
changing, e.g.
CPU 0 CPU 1
// doesn't save PMR
flags = local_daif_save();
// detects PSEUDO-NMI
// attempts to restore PMR
local_daif_restore(flags);
This can potentially lead to erratic behaviour, and for stateful
sequences it would be better to use alternatives such that the entire
sequence is patched atomically.
2) For several cpucaps we perform some enablement/intialization work
between detecting the cpucap nad patching alternatives. For some
features (e.g. SVE and SME) we need to record some additional
properties (e.g. vector lengths) before patching alternatives.
If patched alternative sequences consume any of the recorded
properties, it's possible that these race with the
enablement/initialization and consume stale values, which could
potentially result in erratic behaviour. It would be better to use
alternatives such that the enablement/initialization is guaranteed to
happen before any such usage.
3) Most code doesn't run between cpucaps being detected and their
alternatives being patched, and will have redundant code generated,
with an alternative branch for system_capabilities_finalized(), and a
bitmap test for cpus_have_cap(). This bloats the kernel and wastes
I-cache resources, and the resulting branching structure pessimizes
compiler output.
This is especially noticeable in part of the kernel which need to
test a number of cpucaps in quick succession, such as exception
handlers in entry-common.c and state save/restore in fpsimd.c. Using
alternative branches directly can dramatically improve the code
generated for such paths (e.g. making the entry code several KB
smaller in some configurations).
This series attempts to address the above issues by removing
cpus_have_const_cap() and migrating code over to alternative branches
wherever possible:
* Patches 1 to 2 address a couple of bugs I spotted where cpucaps
are consumed prior to being initialized.
* Patches 3 to 5 rework some low-level cpucap helpers and add new
helpers which are used later in the series.
* Patches 6 to 8 rework some feature enablement code so that this can
work in the window between cpucap detection and alternative patching
without the need to use cpus_have_const_cap().
* Patch 9 moves KVM entirely over to cpus_have_final_cap().
* Patches 10 to 13 clean up the ARM64_HAS_NO_FPSIMD cpucap, inverting
this and making it behave the same way as all other system cpucaps.
* Patches 14 to 37 migrate code away from cpus_have_const_cap().
* Patch 38 removes the now-unused cpus_have_const_cap().
The series is based on v6.6-rc3.
Since v1 [1]:
* Restore missing tags from Marc and Thomas
* Trivial rebase from v6.6-rc2 to to v6.6-rc3
* Split cpucap ordering assertions into a separate patch
* Removed stale reference in __system_matches_cap()
* Folded in Reviewed-by tags
Since v2 [2]:
* Update generated-y for s/cpucaps.h/cpucap-defs.h/
[1] https://lore.kernel.org/linux-arm-kernel/20230919092850.1940729-1-mark.rutland@arm.com/
[2] https://lore.kernel.org/linux-arm-kernel/20231005095025.1872048-1-mark.rutland@arm.com/
Mark Rutland (38):
clocksource/drivers/arm_arch_timer: Initialize evtstrm after
finalizing cpucaps
arm64/arm: xen: enlighten: Fix KPTI checks
arm64: Factor out cpucap definitions
arm64: Add cpucap_is_possible()
arm64: Add cpus_have_final_boot_cap()
arm64: Rework setup_cpu_features()
arm64: Fixup user features at boot time
arm64: Split kpti_install_ng_mappings()
arm64: kvm: Use cpus_have_final_cap() explicitly
arm64: Explicitly save/restore CPACR when probing SVE and SME
arm64: Use build-time assertions for cpucap ordering
arm64: Rename SVE/SME cpu_enable functions
arm64: Use a positive cpucap for FP/SIMD
arm64: Avoid cpus_have_const_cap() for
ARM64_HAS_{ADDRESS,GENERIC}_AUTH
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_ARMv8_4_TTL
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_BTI
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_CACHE_DIC
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_CNP
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNG
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXT
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE
arm64: Avoid cpus_have_const_cap() for ARM64_MTE
arm64: Avoid cpus_have_const_cap() for ARM64_SSBS
arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2
arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64}
arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154
arm64: Avoid cpus_have_const_cap() for
ARM64_WORKAROUND_NVIDIA_CARMEL_CNP
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI
arm64: Remove cpus_have_const_cap()
arch/arm/xen/enlighten.c | 25 +--
arch/arm64/include/asm/Kbuild | 2 +-
arch/arm64/include/asm/alternative-macros.h | 8 +-
arch/arm64/include/asm/arch_gicv3.h | 8 +
arch/arm64/include/asm/archrandom.h | 2 +-
arch/arm64/include/asm/cacheflush.h | 2 +-
arch/arm64/include/asm/cpucaps.h | 67 ++++++++
arch/arm64/include/asm/cpufeature.h | 96 +++++------
arch/arm64/include/asm/fpsimd.h | 35 +++-
arch/arm64/include/asm/irqflags.h | 20 +--
arch/arm64/include/asm/kvm_emulate.h | 4 +-
arch/arm64/include/asm/kvm_host.h | 2 +-
arch/arm64/include/asm/kvm_mmu.h | 2 +-
arch/arm64/include/asm/mmu.h | 2 +-
arch/arm64/include/asm/mmu_context.h | 28 ++--
arch/arm64/include/asm/module.h | 3 +-
arch/arm64/include/asm/pgtable-prot.h | 6 +-
arch/arm64/include/asm/spectre.h | 2 +-
arch/arm64/include/asm/tlbflush.h | 7 +-
arch/arm64/include/asm/vectors.h | 2 +-
arch/arm64/kernel/cpu_errata.c | 17 --
arch/arm64/kernel/cpufeature.c | 168 ++++++++++++--------
arch/arm64/kernel/efi.c | 3 +-
arch/arm64/kernel/fpsimd.c | 81 ++++++----
arch/arm64/kernel/module-plts.c | 7 +-
arch/arm64/kernel/process.c | 2 +-
arch/arm64/kernel/proton-pack.c | 2 +-
arch/arm64/kernel/smp.c | 3 +-
arch/arm64/kernel/suspend.c | 13 +-
arch/arm64/kernel/sys_compat.c | 2 +-
arch/arm64/kernel/traps.c | 2 +-
arch/arm64/kernel/vdso.c | 2 +-
arch/arm64/kvm/arm.c | 10 +-
arch/arm64/kvm/guest.c | 4 +-
arch/arm64/kvm/hyp/pgtable.c | 4 +-
arch/arm64/kvm/mmu.c | 2 +-
arch/arm64/kvm/sys_regs.c | 2 +-
arch/arm64/kvm/vgic/vgic-v3.c | 2 +-
arch/arm64/lib/delay.c | 2 +-
arch/arm64/mm/fault.c | 2 +-
arch/arm64/mm/hugetlbpage.c | 3 +-
arch/arm64/mm/mmap.c | 2 +-
arch/arm64/mm/mmu.c | 3 +-
arch/arm64/mm/proc.S | 3 +-
arch/arm64/tools/Makefile | 4 +-
arch/arm64/tools/cpucaps | 2 +-
arch/arm64/tools/gen-cpucaps.awk | 6 +-
drivers/clocksource/arm_arch_timer.c | 31 +++-
drivers/irqchip/irq-gic-v3.c | 11 --
include/linux/cpuhotplug.h | 2 +
50 files changed, 430 insertions(+), 290 deletions(-)
create mode 100644 arch/arm64/include/asm/cpucaps.h
--
2.30.2
More information about the linux-arm-kernel
mailing list