[PATCH 0/4] arm64: Support the TSO memory model
Hector Martin
marcan at marcan.st
Wed Apr 10 17:51:19 PDT 2024
x86 CPUs implement a stricter memory modern than ARM64 (TSO). For this
reason, x86 emulation on baseline ARM64 systems requires very expensive
memory model emulation. Having hardware that supports this natively is
therefore very attractive. Such hardware, in fact, exists. This series
adds support for userspace to identify when TSO is available and
toggle it on, if supported.
Some ARM64 CPUs intrinsically implement the TSO memory model, while
others expose is as an IMPDEF control. Apple Silicon SoCs are in the
latter category. Using TSO for x86 emulation on chips that support it
has been shown to provide a massive performance boost [1].
Patch 1 introduces the PR_{SET,GET}_MEM_MODEL userspace control, which
is initially not implemented for any architectures.
Patch 2 implements it for CPUs which are known, to the best of my
knowledge, to always implement the TSO memory model unconditionally.
This uses the cpufeature mechanism to only enable this if *all* cores in
the system meet the requirements.
Patch 3 adds the scaffolding necesasry to save/restore the ACTLR_EL1
register across context switches. This register contains IMPDEF flags
related to CPU execution, and on Apple CPUs this is where the runtime
TSO toggle bit is implemented. Other CPUs could conceivably benefit from
this scaffolding if they also use ACTLR_EL1 for things that could
ostensibly be runtime controlled and context-switched. For this to work,
ACTLR_EL1 must have a uniform layout across all cores in the system.
Finally, patch 4 implements PR_{SET,GET}_MEM_MODEL for Apple CPUs by
hooking it up to flip the appropriate ACTLR_EL1 bit when the Apple TSO
feature is detected (on all CPUs, which also implies the uniform
ACTLR_EL1 layout).
This series has been brewing in the downstream Asahi Linux tree for a
while now, and ships to thousands of users. A subset have been using it
with FEX-Emu, which already supports this feature. This rebase on
v6.9-rc1 is only build-tested (all intermediate commits with and without
the config enabled, on ARM64) but I'll update the downstream branch soon
with this version and get it pushed out to users/testers.
The Apple support works on bare metal and *should* work exactly the same
way on macOS VMs (as alluded to by Zayd in his independent submission [3]),
though I haven't personally verified this. KVM support for this is left
for a future patchset.
(Apologies for the large Cc: list; I want to make sure nobody who got
Cced on Zayd's alternate take is left out of this one.)
[1] https://fex-emu.com/FEX-2306/
[2] https://github.com/AsahiLinux/linux/tree/bits/220-tso
[3] https://lore.kernel.org/lkml/20240410211652.16640-1-zayd_qumsieh@apple.com/
To: Catalin Marinas <catalin.marinas at arm.com>
To: Will Deacon <will at kernel.org>
To: Marc Zyngier <maz at kernel.org>
To: Mark Rutland <mark.rutland at arm.com>
Cc: Zayd Qumsieh <zayd_qumsieh at apple.com>
Cc: Justin Lu <ih_justin at apple.com>
Cc: Ryan Houdek <Houdek.Ryan at fex-emu.org>
Cc: Mark Brown <broonie at kernel.org>
Cc: Ard Biesheuvel <ardb at kernel.org>
Cc: Mateusz Guzik <mjguzik at gmail.com>
Cc: Anshuman Khandual <anshuman.khandual at arm.com>
Cc: Oliver Upton <oliver.upton at linux.dev>
Cc: Miguel Luis <miguel.luis at oracle.com>
Cc: Joey Gouly <joey.gouly at arm.com>
Cc: Christoph Paasch <cpaasch at apple.com>
Cc: Kees Cook <keescook at chromium.org>
Cc: Sami Tolvanen <samitolvanen at google.com>
Cc: Baoquan He <bhe at redhat.com>
Cc: Joel Granados <j.granados at samsung.com>
Cc: Dawei Li <dawei.li at shingroup.cn>
Cc: Andrew Morton <akpm at linux-foundation.org>
Cc: Florent Revest <revest at chromium.org>
Cc: David Hildenbrand <david at redhat.com>
Cc: Stefan Roesch <shr at devkernel.io>
Cc: Andy Chiu <andy.chiu at sifive.com>
Cc: Josh Triplett <josh at joshtriplett.org>
Cc: Oleg Nesterov <oleg at redhat.com>
Cc: Helge Deller <deller at gmx.de>
Cc: Zev Weiss <zev at bewilderbeest.net>
Cc: Ondrej Mosnacek <omosnace at redhat.com>
Cc: Miguel Ojeda <ojeda at kernel.org>
Cc: linux-arm-kernel at lists.infradead.org
Cc: linux-kernel at vger.kernel.org
Cc: Asahi Linux <asahi at lists.linux.dev>
Signed-off-by: Hector Martin <marcan at marcan.st>
---
Hector Martin (4):
prctl: Introduce PR_{SET,GET}_MEM_MODEL
arm64: Implement PR_{GET,SET}_MEM_MODEL for always-TSO CPUs
arm64: Introduce scaffolding to add ACTLR_EL1 to thread state
arm64: Implement Apple IMPDEF TSO memory model control
arch/arm64/Kconfig | 14 ++++++
arch/arm64/include/asm/apple_cpufeature.h | 15 +++++++
arch/arm64/include/asm/cpufeature.h | 10 +++++
arch/arm64/include/asm/processor.h | 3 ++
arch/arm64/kernel/Makefile | 3 +-
arch/arm64/kernel/cpufeature.c | 11 ++---
arch/arm64/kernel/cpufeature_impdef.c | 61 ++++++++++++++++++++++++++
arch/arm64/kernel/process.c | 71 +++++++++++++++++++++++++++++++
arch/arm64/kernel/setup.c | 8 ++++
arch/arm64/tools/cpucaps | 2 +
include/linux/memory_ordering_model.h | 11 +++++
include/uapi/linux/prctl.h | 5 +++
kernel/sys.c | 21 +++++++++
13 files changed, 229 insertions(+), 6 deletions(-)
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20240411-tso-e86fdceb94b8
Best regards,
--
Hector Martin <marcan at marcan.st>
More information about the linux-arm-kernel
mailing list