[PATCH 13/20] arm64/fpsimd: Make clone() compatible with ZA lazy saving

Will Deacon will at kernel.org
Wed May 7 07:58:01 PDT 2025


On Tue, May 06, 2025 at 04:25:16PM +0100, Mark Rutland wrote:
> @@ -441,14 +449,39 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>  				childregs->sp = stack_start;
>  		}
>  
> +		/*
> +		 * Due to the AAPCS64 "ZA lazy saving scheme", PSTATE.ZA and
> +		 * TPIDR2 need to be manipulated as a pair, and either both
> +		 * need to be inherited or both need to be reset.
> +		 *
> +		 * Within a process, child threads must not inherit their
> +		 * parent's TPIDR2 value or they may clobber their parent's
> +		 * stack at some later point.
> +		 *
> +		 * When a process is fork()'d, the child must inherit ZA and
> +		 * TPIDR2 from its parent in case there was dormant ZA state.
> +		 *
> +		 * Use CLONE_VM to determine when the child will share the
> +		 * address space with the parent, and cannot safely inherit the
> +		 * state.
> +		 */
> +		if (system_supports_sme()) {
> +			if (!(clone_flags & CLONE_VM)) {
> +				p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);

Why do we need to re-read this register given that we did this just a few
lines earlier?

> diff --git a/tools/testing/selftests/arm64/abi/tpidr2.c b/tools/testing/selftests/arm64/abi/tpidr2.c
> index 285c47dd42f63..eb19dcc37a755 100644
> --- a/tools/testing/selftests/arm64/abi/tpidr2.c
> +++ b/tools/testing/selftests/arm64/abi/tpidr2.c
> @@ -169,8 +169,10 @@ static int sys_clone(unsigned long clone_flags, unsigned long newsp,
>  			   child_tidptr);
>  }
>  
> +#define __STACK_SIZE (8 * 1024 * 1024)
> +
>  /*
> - * If we clone with CLONE_SETTLS then the value in the parent should
> + * If we clone with CLONE_VM then the value in the parent should
>   * be unchanged and the child should start with zero and be able to
>   * set its own value.
>   */
> @@ -179,11 +181,19 @@ static int write_clone_read(void)
>  	int parent_tid, child_tid;
>  	pid_t parent, waiting;
>  	int ret, status;
> +	void *stack;
>  
>  	parent = getpid();
>  	set_tpidr2(parent);
>  
> -	ret = sys_clone(CLONE_SETTLS, 0, &parent_tid, 0, &child_tid);
> +	stack = malloc(__STACK_SIZE);
> +	if (!stack) {
> +		putstr("# malloc() failed\n");
> +		return 0;
> +	}
> +
> +	ret = sys_clone(CLONE_VM, (unsigned long)stack + __STACK_SIZE,
> +			&parent_tid, 0, &child_tid);
>  	if (ret == -1) {
>  		putstr("# clone() failed\n");
>  		putnum(errno);

Thank you for updating the selftest, but please can you put it in a
separate patch?

Cheers,

Will



More information about the linux-arm-kernel mailing list