[PATCH 12/18] arm64: fpsimd: Move fpsimd save/restore inline

Mark Rutland mark.rutland at arm.com
Wed May 27 08:34:57 PDT 2026


On Wed, May 27, 2026 at 03:49:18PM +0100, Vladimir Murzin wrote:
> On 5/21/26 14:25, Mark Rutland wrote:
> > +static inline void fpsimd_save_vregs(struct user_fpsimd_state *state)
> > +{
> > +	instrument_write(state->vregs, sizeof(state->vregs));
> > +	asm volatile(
> > +	__FPSIMD_PREAMBLE
> > +	"	stp	q0,  q1,  [%[vregs], #16 * 0]\n"
> > +	"	stp	q2,  q3,  [%[vregs], #16 * 2]\n"
> > +	"	stp	q4,  q5,  [%[vregs], #16 * 4]\n"
> > +	"	stp	q6,  q7,  [%[vregs], #16 * 6]\n"
> > +	"	stp	q8,  q9,  [%[vregs], #16 * 8]\n"
> > +	"	stp	q10, q11, [%[vregs], #16 * 10]\n"
> > +	"	stp	q12, q13, [%[vregs], #16 * 12]\n"
> > +	"	stp	q14, q15, [%[vregs], #16 * 14]\n"
> > +	"	stp	q16, q17, [%[vregs], #16 * 16]\n"
> > +	"	stp	q18, q19, [%[vregs], #16 * 18]\n"
> > +	"	stp	q20, q21, [%[vregs], #16 * 20]\n"
> > +	"	stp	q22, q23, [%[vregs], #16 * 22]\n"
> > +	"	stp	q24, q25, [%[vregs], #16 * 24]\n"
> > +	"	stp	q26, q27, [%[vregs], #16 * 26]\n"
> > +	"	stp	q28, q29, [%[vregs], #16 * 28]\n"
> > +	"	stp	q30, q31, [%[vregs], #16 * 30]\n"
> > +	: "=Q" (state->vregs)
> > +	: [vregs] "r" (state->vregs)
> 
> Missing "memory" clobber here?

Here the "=Q" constraint is sufficient.

The "=Q" output constraint describes that the operand is written to but
the old value is not read. It behaves like an "=m" constraint, but
places the base address in a single register without any offset (and no
writeback addressing mode). Here it applies to the entirety of the
vregs array.

See https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html :

|  Q
|     A memory address which uses a single base register with no offset

We generally prefer to use "Q" constraints rather than memory clobbers
where possible, since it gives the compiler more freedom (e.g. due to
*not* clobbering unrelated memory locations). 

The "Q" constraint causes the output to be formatted as a memory address
(e.g. "[x0]"), so to be able to apply an offset we need a separate "r"
constraint to get the base register. It doesn't matter whether the
compiler happens to use a different register for that (and in practice
compilers realise they can use the register allocated for the "Q"
conrstraint).

Unfortunately we can't use "Q" for the scalable registers, since the
size isn't known at compile time, and the simplest option for those is
to use a memory clobber.

[...]

> > +static inline void fpsimd_load_vregs(const struct user_fpsimd_state *state)
> > +{
> > +	instrument_read(state->vregs, sizeof(state->vregs));
> > +	asm volatile(
> > +	__FPSIMD_PREAMBLE
> > +	"	ldp	q0,  q1,  [%[vregs], #16 * 0]\n"
> > +	"	ldp	q2,  q3,  [%[vregs], #16 * 2]\n"
> > +	"	ldp	q4,  q5,  [%[vregs], #16 * 4]\n"
> > +	"	ldp	q6,  q7,  [%[vregs], #16 * 6]\n"
> > +	"	ldp	q8,  q9,  [%[vregs], #16 * 8]\n"
> > +	"	ldp	q10, q11, [%[vregs], #16 * 10]\n"
> > +	"	ldp	q12, q13, [%[vregs], #16 * 12]\n"
> > +	"	ldp	q14, q15, [%[vregs], #16 * 14]\n"
> > +	"	ldp	q16, q17, [%[vregs], #16 * 16]\n"
> > +	"	ldp	q18, q19, [%[vregs], #16 * 18]\n"
> > +	"	ldp	q20, q21, [%[vregs], #16 * 20]\n"
> > +	"	ldp	q22, q23, [%[vregs], #16 * 22]\n"
> > +	"	ldp	q24, q25, [%[vregs], #16 * 24]\n"
> > +	"	ldp	q26, q27, [%[vregs], #16 * 26]\n"
> > +	"	ldp	q28, q29, [%[vregs], #16 * 28]\n"
> > +	"	ldp	q30, q31, [%[vregs], #16 * 30]\n"
> > +	:
> > +	: "Q" (state->vregs),
> > +	  [vregs] "r" (state->vregs)
> 
> Missing "memory" clobber here?

Same story as for fpsimd_save_vregs() above, except that here the "Q"
input constraint describes that the entirety of the operand is read from
but not written to.

Mark.



More information about the linux-arm-kernel mailing list