[PATCH 12/18] arm64: fpsimd: Move fpsimd save/restore inline

Vladimir Murzin vladimir.murzin at arm.com
Wed May 27 09:13:44 PDT 2026


On 5/27/26 16:34, Mark Rutland wrote:
> On Wed, May 27, 2026 at 03:49:18PM +0100, Vladimir Murzin wrote:
>> On 5/21/26 14:25, Mark Rutland wrote:
>>> +static inline void fpsimd_save_vregs(struct user_fpsimd_state *state)
>>> +{
>>> +	instrument_write(state->vregs, sizeof(state->vregs));
>>> +	asm volatile(
>>> +	__FPSIMD_PREAMBLE
>>> +	"	stp	q0,  q1,  [%[vregs], #16 * 0]\n"
>>> +	"	stp	q2,  q3,  [%[vregs], #16 * 2]\n"
>>> +	"	stp	q4,  q5,  [%[vregs], #16 * 4]\n"
>>> +	"	stp	q6,  q7,  [%[vregs], #16 * 6]\n"
>>> +	"	stp	q8,  q9,  [%[vregs], #16 * 8]\n"
>>> +	"	stp	q10, q11, [%[vregs], #16 * 10]\n"
>>> +	"	stp	q12, q13, [%[vregs], #16 * 12]\n"
>>> +	"	stp	q14, q15, [%[vregs], #16 * 14]\n"
>>> +	"	stp	q16, q17, [%[vregs], #16 * 16]\n"
>>> +	"	stp	q18, q19, [%[vregs], #16 * 18]\n"
>>> +	"	stp	q20, q21, [%[vregs], #16 * 20]\n"
>>> +	"	stp	q22, q23, [%[vregs], #16 * 22]\n"
>>> +	"	stp	q24, q25, [%[vregs], #16 * 24]\n"
>>> +	"	stp	q26, q27, [%[vregs], #16 * 26]\n"
>>> +	"	stp	q28, q29, [%[vregs], #16 * 28]\n"
>>> +	"	stp	q30, q31, [%[vregs], #16 * 30]\n"
>>> +	: "=Q" (state->vregs)
>>> +	: [vregs] "r" (state->vregs)
>> Missing "memory" clobber here?
> Here the "=Q" constraint is sufficient.
> 
> The "=Q" output constraint describes that the operand is written to but
> the old value is not read. It behaves like an "=m" constraint, but
> places the base address in a single register without any offset (and no
> writeback addressing mode). Here it applies to the entirety of the
> vregs array.
> 
> See https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html :
> 
> |  Q
> |     A memory address which uses a single base register with no offset
> 
> We generally prefer to use "Q" constraints rather than memory clobbers
> where possible, since it gives the compiler more freedom (e.g. due to
> *not* clobbering unrelated memory locations). 
> 
> The "Q" constraint causes the output to be formatted as a memory address
> (e.g. "[x0]"), so to be able to apply an offset we need a separate "r"
> constraint to get the base register. It doesn't matter whether the
> compiler happens to use a different register for that (and in practice
> compilers realise they can use the register allocated for the "Q"
> conrstraint).
> 
> Unfortunately we can't use "Q" for the scalable registers, since the
> size isn't known at compile time, and the simplest option for those is
> to use a memory clobber.
> 

That's quite interesting bit of information, thanks for explanation!

> [...]
> 
>>> +static inline void fpsimd_load_vregs(const struct user_fpsimd_state *state)
>>> +{
>>> +	instrument_read(state->vregs, sizeof(state->vregs));
>>> +	asm volatile(
>>> +	__FPSIMD_PREAMBLE
>>> +	"	ldp	q0,  q1,  [%[vregs], #16 * 0]\n"
>>> +	"	ldp	q2,  q3,  [%[vregs], #16 * 2]\n"
>>> +	"	ldp	q4,  q5,  [%[vregs], #16 * 4]\n"
>>> +	"	ldp	q6,  q7,  [%[vregs], #16 * 6]\n"
>>> +	"	ldp	q8,  q9,  [%[vregs], #16 * 8]\n"
>>> +	"	ldp	q10, q11, [%[vregs], #16 * 10]\n"
>>> +	"	ldp	q12, q13, [%[vregs], #16 * 12]\n"
>>> +	"	ldp	q14, q15, [%[vregs], #16 * 14]\n"
>>> +	"	ldp	q16, q17, [%[vregs], #16 * 16]\n"
>>> +	"	ldp	q18, q19, [%[vregs], #16 * 18]\n"
>>> +	"	ldp	q20, q21, [%[vregs], #16 * 20]\n"
>>> +	"	ldp	q22, q23, [%[vregs], #16 * 22]\n"
>>> +	"	ldp	q24, q25, [%[vregs], #16 * 24]\n"
>>> +	"	ldp	q26, q27, [%[vregs], #16 * 26]\n"
>>> +	"	ldp	q28, q29, [%[vregs], #16 * 28]\n"
>>> +	"	ldp	q30, q31, [%[vregs], #16 * 30]\n"
>>> +	:
>>> +	: "Q" (state->vregs),
>>> +	  [vregs] "r" (state->vregs)
>> Missing "memory" clobber here?
> Same story as for fpsimd_save_vregs() above, except that here the "Q"
> input constraint describes that the entirety of the operand is read from
> but not written to.
> 

I'd be surprised if it was different :)

Cheers
Vladimir


> Mark.
> 




More information about the linux-arm-kernel mailing list