[PATCH 0/4] arm64: Support the TSO memory model

Neal Gompa neal at gompa.dev
Wed Apr 10 18:37:47 PDT 2024


On Wed, Apr 10, 2024 at 8:51 PM Hector Martin <marcan at marcan.st> wrote:
>
> x86 CPUs implement a stricter memory modern than ARM64 (TSO). For this
> reason, x86 emulation on baseline ARM64 systems requires very expensive
> memory model emulation. Having hardware that supports this natively is
> therefore very attractive. Such hardware, in fact, exists. This series
> adds support for userspace to identify when TSO is available and
> toggle it on, if supported.
>
> Some ARM64 CPUs intrinsically implement the TSO memory model, while
> others expose is as an IMPDEF control. Apple Silicon SoCs are in the
> latter category. Using TSO for x86 emulation on chips that support it
> has been shown to provide a massive performance boost [1].
>
> Patch 1 introduces the PR_{SET,GET}_MEM_MODEL userspace control, which
> is initially not implemented for any architectures.
>
> Patch 2 implements it for CPUs which are known, to the best of my
> knowledge, to always implement the TSO memory model unconditionally.
> This uses the cpufeature mechanism to only enable this if *all* cores in
> the system meet the requirements.
>
> Patch 3 adds the scaffolding necesasry to save/restore the ACTLR_EL1
> register across context switches. This register contains IMPDEF flags
> related to CPU execution, and on Apple CPUs this is where the runtime
> TSO toggle bit is implemented. Other CPUs could conceivably benefit from
> this scaffolding if they also use ACTLR_EL1 for things that could
> ostensibly be runtime controlled and context-switched. For this to work,
> ACTLR_EL1 must have a uniform layout across all cores in the system.
>
> Finally, patch 4 implements PR_{SET,GET}_MEM_MODEL for Apple CPUs by
> hooking it up to flip the appropriate ACTLR_EL1 bit when the Apple TSO
> feature is detected (on all CPUs, which also implies the uniform
> ACTLR_EL1 layout).
>
> This series has been brewing in the downstream Asahi Linux tree for a
> while now, and ships to thousands of users. A subset have been using it
> with FEX-Emu, which already supports this feature. This rebase on
> v6.9-rc1 is only build-tested (all intermediate commits with and without
> the config enabled, on ARM64) but I'll update the downstream branch soon
> with this version and get it pushed out to users/testers.
>
> The Apple support works on bare metal and *should* work exactly the same
> way on macOS VMs (as alluded to by Zayd in his independent submission [3]),
> though I haven't personally verified this. KVM support for this is left
> for a future patchset.
>
> (Apologies for the large Cc: list; I want to make sure nobody who got
> Cced on Zayd's alternate take is left out of this one.)
>
> [1] https://fex-emu.com/FEX-2306/
> [2] https://github.com/AsahiLinux/linux/tree/bits/220-tso
> [3] https://lore.kernel.org/lkml/20240410211652.16640-1-zayd_qumsieh@apple.com/
>
> To: Catalin Marinas <catalin.marinas at arm.com>
> To: Will Deacon <will at kernel.org>
> To: Marc Zyngier <maz at kernel.org>
> To: Mark Rutland <mark.rutland at arm.com>
> Cc: Zayd Qumsieh <zayd_qumsieh at apple.com>
> Cc: Justin Lu <ih_justin at apple.com>
> Cc: Ryan Houdek <Houdek.Ryan at fex-emu.org>
> Cc: Mark Brown <broonie at kernel.org>
> Cc: Ard Biesheuvel <ardb at kernel.org>
> Cc: Mateusz Guzik <mjguzik at gmail.com>
> Cc: Anshuman Khandual <anshuman.khandual at arm.com>
> Cc: Oliver Upton <oliver.upton at linux.dev>
> Cc: Miguel Luis <miguel.luis at oracle.com>
> Cc: Joey Gouly <joey.gouly at arm.com>
> Cc: Christoph Paasch <cpaasch at apple.com>
> Cc: Kees Cook <keescook at chromium.org>
> Cc: Sami Tolvanen <samitolvanen at google.com>
> Cc: Baoquan He <bhe at redhat.com>
> Cc: Joel Granados <j.granados at samsung.com>
> Cc: Dawei Li <dawei.li at shingroup.cn>
> Cc: Andrew Morton <akpm at linux-foundation.org>
> Cc: Florent Revest <revest at chromium.org>
> Cc: David Hildenbrand <david at redhat.com>
> Cc: Stefan Roesch <shr at devkernel.io>
> Cc: Andy Chiu <andy.chiu at sifive.com>
> Cc: Josh Triplett <josh at joshtriplett.org>
> Cc: Oleg Nesterov <oleg at redhat.com>
> Cc: Helge Deller <deller at gmx.de>
> Cc: Zev Weiss <zev at bewilderbeest.net>
> Cc: Ondrej Mosnacek <omosnace at redhat.com>
> Cc: Miguel Ojeda <ojeda at kernel.org>
> Cc: linux-arm-kernel at lists.infradead.org
> Cc: linux-kernel at vger.kernel.org
> Cc: Asahi Linux <asahi at lists.linux.dev>
>
> Signed-off-by: Hector Martin <marcan at marcan.st>
> ---
> Hector Martin (4):
>       prctl: Introduce PR_{SET,GET}_MEM_MODEL
>       arm64: Implement PR_{GET,SET}_MEM_MODEL for always-TSO CPUs
>       arm64: Introduce scaffolding to add ACTLR_EL1 to thread state
>       arm64: Implement Apple IMPDEF TSO memory model control
>
>  arch/arm64/Kconfig                        | 14 ++++++
>  arch/arm64/include/asm/apple_cpufeature.h | 15 +++++++
>  arch/arm64/include/asm/cpufeature.h       | 10 +++++
>  arch/arm64/include/asm/processor.h        |  3 ++
>  arch/arm64/kernel/Makefile                |  3 +-
>  arch/arm64/kernel/cpufeature.c            | 11 ++---
>  arch/arm64/kernel/cpufeature_impdef.c     | 61 ++++++++++++++++++++++++++
>  arch/arm64/kernel/process.c               | 71 +++++++++++++++++++++++++++++++
>  arch/arm64/kernel/setup.c                 |  8 ++++
>  arch/arm64/tools/cpucaps                  |  2 +
>  include/linux/memory_ordering_model.h     | 11 +++++
>  include/uapi/linux/prctl.h                |  5 +++
>  kernel/sys.c                              | 21 +++++++++
>  13 files changed, 229 insertions(+), 6 deletions(-)
> ---
> base-commit: 4cece764965020c22cff7665b18a012006359095
> change-id: 20240411-tso-e86fdceb94b8
>

The series looks good to me.

Reviewed-by: Neal Gompa <neal at gompa.dev>



-- 
真実はいつも一つ!/ Always, there's only one truth!



More information about the linux-arm-kernel mailing list