[PATCH 4/5] arm64: atomics: lse: improve constraints for simple ops
Mark Rutland
mark.rutland at arm.com
Tue Dec 14 05:04:02 PST 2021
On Mon, Dec 13, 2021 at 07:40:23PM +0000, Will Deacon wrote:
> On Fri, Dec 10, 2021 at 03:14:09PM +0000, Mark Rutland wrote:
> > We have overly conservative assembly constraints for the basic FEAT_LSE
> > atomic instructions, and using more accurate and permissive constraints
> > will allow for better code generation.
> >
> > The FEAT_LSE basic atomic instructions have come in two forms:
> >
> > LD{op}{order}{size} <Rs>, <Rt>, [<Rn>]
> > ST{op}{order}{size} <Rs>, [<Rn>]
> > For either form, both <Rs> and <Rn> are read but not written back to,
> > and <Rt> is written with the original value of the memory location.
> > Where (<Rt> == <Rs>) or (<Rt> == <Rn>), <Rt> is written *after* the
> > other register value(s) are consumed. There are no UNPREDICTABLE or
> > CONSTRAINED UNPREDICTABLE behaviours when any pair of <Rs>, <Rt>, or
> > <Rn> are the same register.
> > Our current inline assembly always uses <Rs> == <Rt>, treating this
> > register as both an input and an output (using a '+r' constraint). This
> > forces the compiler to do some unnecessary register shuffling and/or
> > redundant value generation.
> > We can improve this with more accurate constraints, separating <Rs> and
> > <Rt>, where <Rs> is an input-only register ('r'), and <Rt> is an
> > output-only value ('=r'). As <Rt> is written back after <Rs> is
> > consumed, it does not need to be earlyclobber ('=&r'), leaving the
> > compiler free to use the same register for both <Rs> and <Rt> where this
> > is desirable.
> >
> > At the same time, the redundant 'r' constraint for `v` is removed, as
> > the `+Q` constraint is sufficient.
> Makes sense to me. I'm not sure _why_ the current constraints are so weird;
> maybe a hangover from when we patched them inline?
Yup; the constraints used to make sense when we were patching the instructions,
and we just never cleaned them up. Details below -- I can add a link tag to
this mail if you'd like?
For example, back when we introduced the ops in commit:
c0385b24af15020a ("arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomics")
We had:
| #define ATOMIC_OP(op, asm_op) \
| static inline void atomic_##op(int i, atomic_t *v) \
| { \
| register int w0 asm ("w0") = i; \
| register atomic_t *x1 asm ("x1") = v; \
| \
| asm volatile( \
| __LL_SC_CALL(op) \
| : "+r" (w0), "+Q" (v->counter) \
| : "r" (x1) \
| : "x30"); \
| } \
IIUC, for those constraints:
* "+r" (w0) -- necessary because the LL-SC forms might clobber w0 (e.g. if
used for their `tmp` variable). For the value-returning ops we had to use
w0/x0 for the return value as the out-of-line LL-SC ops followed the AAPCS
parameter passing rules.
* "+Q" (v->counter) -- necessary to clobber the memory location, "Q"
specifically to ensure the compiler didn't expect some write-back addressing
mode to have adjusted a register.
* "r" (x1) -- necessary to ensure the address was available in X1 for the LL-SC
code, since I beleive the "+Q" constraint *could* have used a copy in another
register. I could be mistaken on this one, since there's no rationale in the
original commit message or code.
When that approach was thrown away in commits
addfc38672c73efd ("arm64: atomics: avoid out-of-line ll/sc atomics")
3337cb5aea594e40 ("arm64: avoid using hard-coded registers for LSE atomics")
... we stopped hard-coding the registers, and added a `tmp` register for ops
with a return value, but otherwise left the constraints as-is.
> Anywho:
>
> Acked-by: Will Deacon <will at kernel.org>
Thanks!
Mark.
More information about the linux-arm-kernel
mailing list