Feature request for enabling SCTLR_ELx.nAA

Will Deacon will at kernel.org
Fri Feb 24 09:09:07 PST 2023


On Thu, Feb 23, 2023 at 12:15:43PM +0000, Catalin Marinas wrote:
> On Wed, Feb 22, 2023 at 01:17:55PM -1000, Richard Henderson wrote:
> > It would be helpful to have a prctl for enabling nAA.  Since we already have
> > task->thread.sctlr_user, it would seem that this would not require any
> > additional overhead during __switch_to().
> 
> This shouldn't be difficult to add.
> 
> > My use case is the QEMU JIT, and being able to make use of LDAR/STLR instead
> > of explicit DBM in some cases.  At the moment, I can only make this
> > replacement when the address is provably aligned, which is tricky to do with
> > the time budget of a JIT, so the replacement rarely triggers.  This ought to
> > make a difference when emulating strongly ordered guests like x86.
> 
> It looks like in 4.17 (commit 7206dc93a58f, "arm64: Expose Arm v8.4
> features") we exposed the LSE2 features as HWCAP_USCAT (unaligned
> single-copy atomicity) but that still restricts LDAR/STLR to a 16-byte
> boundary as there is no control for SCTLR_EL1.nAA.
> 
> Given that allowing unaligned accesses could break atomicity, I wouldn't
> set this bit to 1 permanently, it helps catching tricky software bugs.
> So a prctl() makes more sense. If your intended use is just preserving
> the acquire/release semantics, I don't think these are affected by the
> atomicity rules even if they go across a 16-byte boundary.
> 
> Adding Will and Mark for their view on this.

I'd definitely want to see some numbers to justify the complexity of a new
prctl(), but otherwise it sounds fine as long as it's opt-in and cleared on
exec().

Will



More information about the linux-arm-kernel mailing list