Feature request for enabling SCTLR_ELx.nAA

Catalin Marinas catalin.marinas at arm.com
Thu Feb 23 04:15:43 PST 2023


On Wed, Feb 22, 2023 at 01:17:55PM -1000, Richard Henderson wrote:
> It would be helpful to have a prctl for enabling nAA.  Since we already have
> task->thread.sctlr_user, it would seem that this would not require any
> additional overhead during __switch_to().

This shouldn't be difficult to add.

> My use case is the QEMU JIT, and being able to make use of LDAR/STLR instead
> of explicit DBM in some cases.  At the moment, I can only make this
> replacement when the address is provably aligned, which is tricky to do with
> the time budget of a JIT, so the replacement rarely triggers.  This ought to
> make a difference when emulating strongly ordered guests like x86.

It looks like in 4.17 (commit 7206dc93a58f, "arm64: Expose Arm v8.4
features") we exposed the LSE2 features as HWCAP_USCAT (unaligned
single-copy atomicity) but that still restricts LDAR/STLR to a 16-byte
boundary as there is no control for SCTLR_EL1.nAA.

Given that allowing unaligned accesses could break atomicity, I wouldn't
set this bit to 1 permanently, it helps catching tricky software bugs.
So a prctl() makes more sense. If your intended use is just preserving
the acquire/release semantics, I don't think these are affected by the
atomicity rules even if they go across a 16-byte boundary.

Adding Will and Mark for their view on this.

-- 
Catalin



More information about the linux-arm-kernel mailing list