[PATCH 13/20] arm64/fpsimd: Make clone() compatible with ZA lazy saving

Will Deacon will at kernel.org
Wed May 7 09:11:38 PDT 2025


On Wed, May 07, 2025 at 04:22:06PM +0100, Mark Rutland wrote:
> On Wed, May 07, 2025 at 03:58:01PM +0100, Will Deacon wrote:
> > On Tue, May 06, 2025 at 04:25:16PM +0100, Mark Rutland wrote:
> > > @@ -441,14 +449,39 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > >  				childregs->sp = stack_start;
> > >  		}
> > >  
> > > +		/*
> > > +		 * Due to the AAPCS64 "ZA lazy saving scheme", PSTATE.ZA and
> > > +		 * TPIDR2 need to be manipulated as a pair, and either both
> > > +		 * need to be inherited or both need to be reset.
> > > +		 *
> > > +		 * Within a process, child threads must not inherit their
> > > +		 * parent's TPIDR2 value or they may clobber their parent's
> > > +		 * stack at some later point.
> > > +		 *
> > > +		 * When a process is fork()'d, the child must inherit ZA and
> > > +		 * TPIDR2 from its parent in case there was dormant ZA state.
> > > +		 *
> > > +		 * Use CLONE_VM to determine when the child will share the
> > > +		 * address space with the parent, and cannot safely inherit the
> > > +		 * state.
> > > +		 */
> > > +		if (system_supports_sme()) {
> > > +			if (!(clone_flags & CLONE_VM)) {
> > > +				p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
> > 
> > Why do we need to re-read this register given that we did this just a few
> > lines earlier?
> 
> Sorry -- I had meant to delete the earlier read. My intent was to centralise
> manipulation of TPIDR2 (and ZA) in this block so that it was clear that they
> were manipulated as a pair.
> 
> I will delete the earlier read, and make this:
> 
> | if (system_supports_sme()) {
> | 	if (!(clone_flags & CLONE_VM)) {
> | 		p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
> | 		ret = copy_thread_za(p, current);
> | 		if (ret)
> | 			return ret;
> | 	} else {
> | 		p->thread.tpidr2_el0 = 0;

If we context-switch here, can we end up reading the register value
back into the thread structure?

> | 		WARN_ON_ONCE(p->thread.svcr & SVCR_ZA_MASK);
> | 	}
> | }
> 
> ... or I can clear TPIDR2 in arch_dup_task_struct() along with ZA, delete the
> earlier read here, and make this:
> 
> | if (system_supports_sme() && !(clone_flags & CLONE_VM)) {
> | 	p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
> | 	ret = copy_thread_za(p, current);
> | 	if (ret)
> | 		return ret;
> | }
> 
> Any preference?

I don't mind, assuming they both work :)

Will



More information about the linux-arm-kernel mailing list