[PATCH 4/5] arm64: atomics: lse: improve constraints for simple ops

Tue Dec 14 05:04:02 PST 2021

On Mon, Dec 13, 2021 at 07:40:23PM +0000, Will Deacon wrote:
> On Fri, Dec 10, 2021 at 03:14:09PM +0000, Mark Rutland wrote:
> > We have overly conservative assembly constraints for the basic FEAT_LSE
> > atomic instructions, and using more accurate and permissive constraints
> > will allow for better code generation.
> > 
> > The FEAT_LSE basic atomic instructions have come in two forms:
> > 
> > 	LD{op}{order}{size} <Rs>, <Rt>, [<Rn>]
> > 	ST{op}{order}{size} <Rs>, [<Rn>]

> > For either form, both <Rs> and <Rn> are read but not written back to,
> > and <Rt> is written with the original value of the memory location.
> > Where (<Rt> == <Rs>) or (<Rt> == <Rn>), <Rt> is written *after* the
> > other register value(s) are consumed. There are no UNPREDICTABLE or
> > CONSTRAINED UNPREDICTABLE behaviours when any pair of <Rs>, <Rt>, or
> > <Rn> are the same register.

> > Our current inline assembly always uses <Rs> == <Rt>, treating this
> > register as both an input and an output (using a '+r' constraint). This
> > forces the compiler to do some unnecessary register shuffling and/or
> > redundant value generation.

> > We can improve this with more accurate constraints, separating <Rs> and
> > <Rt>, where <Rs> is an input-only register ('r'), and <Rt> is an
> > output-only value ('=r'). As <Rt> is written back after <Rs> is
> > consumed, it does not need to be earlyclobber ('=&r'), leaving the
> > compiler free to use the same register for both <Rs> and <Rt> where this
> > is desirable.
> > 
> > At the same time, the redundant 'r' constraint for `v` is removed, as
> > the `+Q` constraint is sufficient.

> Makes sense to me. I'm not sure _why_ the current constraints are so weird;
> maybe a hangover from when we patched them inline?

Yup; the constraints used to make sense when we were patching the instructions,
and we just never cleaned them up. Details below -- I can add a link tag to
this mail if you'd like?

For example, back when we introduced the ops in commit:

  c0385b24af15020a ("arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomics")

We had:

| #define ATOMIC_OP(op, asm_op)                                          \
| static inline void atomic_##op(int i, atomic_t *v)                     \
| {                                                                      \
|        register int w0 asm ("w0") = i;                                 \
|        register atomic_t *x1 asm ("x1") = v;                           \
|                                                                        \
|        asm volatile(                                                   \
|        __LL_SC_CALL(op)                                                \
|        : "+r" (w0), "+Q" (v->counter)                                  \
|        : "r" (x1)                                                      \
|        : "x30");                                                       \
| }                                                                      \

IIUC, for those constraints:

* "+r" (w0) -- necessary because the LL-SC forms might clobber w0 (e.g. if
  used for their `tmp` variable). For the value-returning ops we had to use
  w0/x0 for the return value as the out-of-line LL-SC ops followed the AAPCS
  parameter passing rules.

* "+Q" (v->counter) -- necessary to clobber the memory location, "Q"
  specifically to ensure the compiler didn't expect some write-back addressing
  mode to have adjusted a register.

* "r" (x1) -- necessary to ensure the address was available in X1 for the LL-SC
  code, since I beleive the "+Q" constraint *could* have used a copy in another
  register. I could be mistaken on this one, since there's no rationale in the
  original commit message or code.

When that approach was thrown away in commits

  addfc38672c73efd ("arm64: atomics: avoid out-of-line ll/sc atomics")
  3337cb5aea594e40 ("arm64: avoid using hard-coded registers for LSE atomics")

... we stopped hard-coding the registers, and added a `tmp` register for ops
with a return value, but otherwise left the constraints as-is.

> Anywho:
> 
> Acked-by: Will Deacon <will at kernel.org>

Thanks!

Mark.