[PATCH v11 00/40] arm64/sme: Initial support for the Scalable Matrix Extension
Shuah Khan
skhan at linuxfoundation.org
Tue Feb 8 10:54:42 PST 2022
On 2/7/22 8:20 AM, Mark Brown wrote:
> This series provides initial support for the ARMv9 Scalable Matrix
> Extension (SME). SME takes the approach used for vectors in SVE and
> extends this to provide architectural support for matrix operations. A
> more detailed overview can be found in [1].
>
> For the kernel SME can be thought of as a series of features which are
> intended to be used together by applications but operate mostly
> orthogonally:
>
> - The ZA matrix register.
> - Streaming mode, in which ZA can be accessed and a subset of SVE
> features are available.
> - A second vector length, used for streaming mode SVE and ZA and
> controlled using a similar interface to that for SVE.
> - TPIDR2, a new userspace controllable system register intended for use
> by the C library for storing context related to the ZA ABI.
>
> A substantial part of the series is dedicated to refactoring the
> existing SVE support so that we don't need to duplicate code for
> handling vector lengths and the SVE registers, this involves creating an
> array of vector types and making the users take the vector type as a
> parameter. I'm not 100% happy with this but wasn't able to come up with
> anything better, duplicating code definitely felt like a bad idea so
> this felt like the least bad thing. If this approach makes sense to
> people it might make sense to split this off into a separate series
> and/or merge it while the rest is pending review to try to make things a
> little more digestable, the series is very large so it'd probably make
> things easier to digest if some of the preparatory refactoring could be
> merged before the rest is ready.
>
> One feature of the architecture of particular note is that switching
> to and from streaming mode may change the size of and invalidate the
> contents of the SVE registers, and when in streaming mode the FFR is not
> accessible. This complicates aspects of the ABI like signal handling
> and ptrace.
>
> This initial implementation is mainly intended to get the ABI in place,
> there are several areas which will be worked on going forwards - some of
> these will be blockers, others could be handled in followup serieses:
>
> - SME is currently not supported for KVM guests, this will be done as a
> followup series. A host system can use SME and run KVM guests but
> SME is not available in the guests.
> - The KVM host support is done in a very simplistic way, were anyone to
> attempt to use it in production there would be performance impacts on
> hosts with SME support. As part of this we also add enumeration of
> fine grained traps.
> - There is not currently ptrace or signal support TPIDR2, this will be
> done as a followup series.
> - No support is currently provided for scheduler control of SME or SME
> applications, given the size of the SME register state the context
> switch overhead may be noticable so this may be needed especially for
> real time applications. Similar concerns already exist for larger
> SVE vector lengths but are amplified for SME, particularly as the
> vector length increases.
> - There has been no work on optimising the performance of anything the
> kernel does.
>
> It is not expected that any systems will be encountered that support SME
> but not SVE, SME is an ARMv9 feature and SVE is mandatory for ARMv9.
> The code attempts to handle any such systems that are encountered but
> this hasn't been tested extensively.
>
> v11:
> - Rebase onto v5.17-rc3.
> - Provide a sme-inst.h to collect manual encodings in kselftest.
> v10:
> - Actually do the rebase of fixups from the previous version into
> relevant patches.
> v9:
> - Remove defensive programming around IS_ENABLED() and FGT in KVM code.
> - Fix naming of TPIDR2 FGT register bit.
> - Add patches making handling of floating point register bits more
> consistent (also sent as separate series).
> - Drop now unused enumeration of fine grained traps.
> v8:
> - Rebase onto v5.17-rc1.
> - Support interoperation with KVM, SME is disabled for KVM guests with
> minimal handling for cleaning up SME state when entering and leaving
> the guest.
> - Document and implement that signal handlers are invoked with ZA and
> streaming mode disabled.
> - Use the RDSVL instruction introduced in EAC2 of the architecture to
> obtain the streaming mode vector length during enumeration, ZA state
> loading/saving and in test programs.
> - Store a pointer to SVCR in fpsimd_last_state and use it in fpsimd_save()
> for interoperation with KVM.
> - Add a test case sme_trap_no_sm checking that we generate a SIGILL
> when using an instruction that requires streaming mode without
> enabling it.
> - Add basic ZA context form validation to testcases helper library.
> - Move signal tests over to validating streaming VL from ZA information.
> - Pulled in patch removing ARRAY_SIZE() so that kselftest builds
> cleanly and to avoid trivial conflicts.
> v7:
> - Rebase onto v5.16-rc3.
> - Reduce indentation when supporting custom triggers for signal tests
> as suggested by Catalin.
> - Change to specifying a width for all CPU features rather than adding
> single bit specific infrastructure.
> - Don't require zeroing of non-shared SVE state during syscalls.
> v6:
> - Rebase onto v5.16-rc1.
> - Return to disabling TIF_SVE on kernel entry even if we have SME
> state, this avoids the need for KVM to handle the case where TIF_SVE
> is set on guest entry.
> - Add syscall-abi.h to SME updates to syscall-abi, mistakenly omitted
> from commit.
> v5:
> - Rebase onto currently merged SVE and kselftest patches.
> - Add support for the FA64 option, introduced in the recently published
> EAC1 update to the specification.
> - Pull in test program for the syscall ABI previously sent separately
> with some revisions and add coverage for the SME ABI.
> - Fix checking for options with 1 bit fields in ID_AA64SMFR0_EL1.
> - Minor fixes and clarifications to the ABI documentation.
> v4:
> - Rebase onto merged patches.
> - Remove an uneeded NULL check in vec_proc_do_default_vl().
> - Include patch to factor out utility routines in kselftests written in
> assembler.
> - Specify -ffreestanding when building TPIDR2 test.
> v3:
> - Skip FFR rather than predicate registers in sve_flush_live().
> - Don't assume a bool is all zeros in sve_flush_live() as per AAPCS.
> - Don't redundantly specify a zero index when clearing FFR.
> v2:
> - Fix several issues with !SME and !SVE configurations.
> - Preserve TPIDR2 when creating a new thread/process unless
> CLONE_SETTLS is set.
> - Report traps due to using features in an invalid mode as SIGILL.
> - Spell out streaming mode behaviour in SVE ABI documentation more
> directly.
> - Document TPIDR2 in the ABI document.
> - Use SMSTART and SMSTOP rather than read/modify/write sequences.
> - Rework logic for exiting streaming mode on syscall.
> - Don't needlessly initialise SVCR on access trap.
> - Always restore SME VL for userspace if SME traps are disabled.
> - Only yield to encourage preemption every 128 iterations in za-test,
> otherwise do a getpid(), and validate SVCR after syscall.
> - Leave streaming mode disabled except when reading the vector length
> in za-test, and disable ZA after detecting a mismatch.
> - Add SME support to vlset.
> - Clarifications and typo fixes in comments.
> - Move sme_alloc() forward declaration back a patch.
>
> [1] https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/scalable-matrix-extension-armv9-a-architecture
>
> Mark Brown (40):
> arm64: Define CPACR_EL1_FPEN similarly to other floating point
> controls
> arm64: Always use individual bits in CPACR floating point enables
> arm64: cpufeature: Always specify and use a field width for
> capabilities
> kselftest/arm64: Remove local ARRAY_SIZE() definitions
> kselftest/arm64: signal: Allow tests to be incompatible with features
> arm64/sme: Provide ABI documentation for SME
> arm64/sme: System register and exception syndrome definitions
> arm64/sme: Manually encode SME instructions
> arm64/sme: Early CPU setup for SME
> arm64/sme: Basic enumeration support
> arm64/sme: Identify supported SME vector lengths at boot
> arm64/sme: Implement sysctl to set the default vector length
> arm64/sme: Implement vector length configuration prctl()s
> arm64/sme: Implement support for TPIDR2
> arm64/sme: Implement SVCR context switching
> arm64/sme: Implement streaming SVE context switching
> arm64/sme: Implement ZA context switching
> arm64/sme: Implement traps and syscall handling for SME
> arm64/sme: Disable ZA and streaming mode when handling signals
> arm64/sme: Implement streaming SVE signal handling
> arm64/sme: Implement ZA signal handling
> arm64/sme: Implement ptrace support for streaming mode SVE registers
> arm64/sme: Add ptrace support for ZA
> arm64/sme: Disable streaming mode and ZA when flushing CPU state
> arm64/sme: Save and restore streaming mode over EFI runtime calls
> KVM: arm64: Hide SME system registers from guests
> KVM: arm64: Trap SME usage in guest
> KVM: arm64: Handle SME host state when running guests
> arm64/sme: Provide Kconfig for SME
> kselftest/arm64: Add manual encodings for SME instructions
> kselftest/arm64: sme: Add SME support to vlset
> kselftest/arm64: Add tests for TPIDR2
> kselftest/arm64: Extend vector configuration API tests to cover SME
> kselftest/arm64: sme: Provide streaming mode SVE stress test
> kselftest/arm64: signal: Handle ZA signal context in core code
> kselftest/arm64: Add stress test for SME ZA context switching
> kselftest/arm64: signal: Add SME signal handling tests
> kselftest/arm64: Add streaming SVE to SVE ptrace tests
> kselftest/arm64: Add coverage for the ZA ptrace interface
> kselftest/arm64: Add SME support to syscall ABI test
>
> Documentation/arm64/elf_hwcaps.rst | 33 +
> Documentation/arm64/index.rst | 1 +
> Documentation/arm64/sme.rst | 432 +++++++++++++
> Documentation/arm64/sve.rst | 70 ++-
> arch/arm64/Kconfig | 11 +
> arch/arm64/include/asm/cpu.h | 4 +
> arch/arm64/include/asm/cpufeature.h | 25 +
> arch/arm64/include/asm/el2_setup.h | 64 +-
> arch/arm64/include/asm/esr.h | 13 +-
> arch/arm64/include/asm/exception.h | 1 +
> arch/arm64/include/asm/fpsimd.h | 110 +++-
> arch/arm64/include/asm/fpsimdmacros.h | 86 +++
> arch/arm64/include/asm/hwcap.h | 8 +
> arch/arm64/include/asm/kvm_arm.h | 5 +-
> arch/arm64/include/asm/kvm_host.h | 4 +
> arch/arm64/include/asm/processor.h | 18 +-
> arch/arm64/include/asm/sysreg.h | 67 +-
> arch/arm64/include/asm/thread_info.h | 2 +
> arch/arm64/include/uapi/asm/hwcap.h | 8 +
> arch/arm64/include/uapi/asm/ptrace.h | 69 ++-
> arch/arm64/include/uapi/asm/sigcontext.h | 55 +-
> arch/arm64/kernel/cpufeature.c | 273 ++++++--
> arch/arm64/kernel/cpuinfo.c | 13 +
> arch/arm64/kernel/entry-common.c | 11 +
> arch/arm64/kernel/entry-fpsimd.S | 36 ++
> arch/arm64/kernel/fpsimd.c | 585 ++++++++++++++++--
> arch/arm64/kernel/process.c | 28 +-
> arch/arm64/kernel/ptrace.c | 356 +++++++++--
> arch/arm64/kernel/signal.c | 194 +++++-
> arch/arm64/kernel/syscall.c | 34 +-
> arch/arm64/kernel/traps.c | 1 +
> arch/arm64/kvm/fpsimd.c | 43 +-
> arch/arm64/kvm/hyp/include/hyp/switch.h | 4 +-
> arch/arm64/kvm/hyp/nvhe/switch.c | 30 +
> arch/arm64/kvm/hyp/vhe/switch.c | 15 +-
> arch/arm64/kvm/sys_regs.c | 9 +-
> arch/arm64/tools/cpucaps | 2 +
> include/uapi/linux/elf.h | 2 +
> include/uapi/linux/prctl.h | 9 +
> kernel/sys.c | 12 +
> tools/testing/selftests/arm64/abi/.gitignore | 1 +
> tools/testing/selftests/arm64/abi/Makefile | 9 +-
> .../selftests/arm64/abi/syscall-abi-asm.S | 69 ++-
> .../testing/selftests/arm64/abi/syscall-abi.c | 205 +++++-
> .../testing/selftests/arm64/abi/syscall-abi.h | 15 +
> tools/testing/selftests/arm64/abi/tpidr2.c | 298 +++++++++
> tools/testing/selftests/arm64/fp/.gitignore | 4 +
> tools/testing/selftests/arm64/fp/Makefile | 12 +-
> tools/testing/selftests/arm64/fp/rdvl-sme.c | 14 +
> tools/testing/selftests/arm64/fp/rdvl.S | 10 +
> tools/testing/selftests/arm64/fp/rdvl.h | 1 +
> tools/testing/selftests/arm64/fp/sme-inst.h | 51 ++
> tools/testing/selftests/arm64/fp/ssve-stress | 59 ++
> tools/testing/selftests/arm64/fp/sve-ptrace.c | 13 +-
> tools/testing/selftests/arm64/fp/sve-test.S | 20 +
> tools/testing/selftests/arm64/fp/vec-syscfg.c | 10 +
> tools/testing/selftests/arm64/fp/vlset.c | 10 +-
> tools/testing/selftests/arm64/fp/za-ptrace.c | 354 +++++++++++
> tools/testing/selftests/arm64/fp/za-stress | 59 ++
> tools/testing/selftests/arm64/fp/za-test.S | 388 ++++++++++++
> .../testing/selftests/arm64/signal/.gitignore | 2 +
> .../selftests/arm64/signal/test_signals.h | 5 +
> .../arm64/signal/test_signals_utils.c | 40 +-
> .../arm64/signal/test_signals_utils.h | 2 +
> .../testcases/fake_sigreturn_sme_change_vl.c | 92 +++
> .../arm64/signal/testcases/sme_trap_no_sm.c | 38 ++
> .../signal/testcases/sme_trap_non_streaming.c | 45 ++
> .../arm64/signal/testcases/sme_trap_za.c | 36 ++
> .../selftests/arm64/signal/testcases/sme_vl.c | 68 ++
> .../arm64/signal/testcases/ssve_regs.c | 129 ++++
> .../arm64/signal/testcases/testcases.c | 36 ++
> .../arm64/signal/testcases/testcases.h | 3 +-
> 72 files changed, 4590 insertions(+), 251 deletions(-)
> create mode 100644 Documentation/arm64/sme.rst
> create mode 100644 tools/testing/selftests/arm64/abi/syscall-abi.h
> create mode 100644 tools/testing/selftests/arm64/abi/tpidr2.c
> create mode 100644 tools/testing/selftests/arm64/fp/rdvl-sme.c
> create mode 100644 tools/testing/selftests/arm64/fp/sme-inst.h
> create mode 100644 tools/testing/selftests/arm64/fp/ssve-stress
> create mode 100644 tools/testing/selftests/arm64/fp/za-ptrace.c
> create mode 100644 tools/testing/selftests/arm64/fp/za-stress
> create mode 100644 tools/testing/selftests/arm64/fp/za-test.S
> create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_sme_change_vl.c
> create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_trap_no_sm.c
> create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_trap_non_streaming.c
> create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_trap_za.c
> create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_vl.c
> create mode 100644 tools/testing/selftests/arm64/signal/testcases/ssve_regs.c
>
>
> base-commit: dfd42facf1e4ada021b939b4e19c935dcdd55566
>
Mark,
Kselftest patches look good to me with your responses to my comments
on individual patches. In the interest of not generating more email
traffic - responding to all here. I already added Reviewed-by to the
patches.
Just the doc patch - SPDX needs addressing.
thanks,
-- Shuah
More information about the linux-arm-kernel
mailing list