[PATCH v8 04/12] arm64: support WFET in smp_cond_relaxed_timeout()

Will Deacon will at kernel.org
Fri Jan 9 06:16:17 PST 2026


On Fri, Jan 09, 2026 at 01:05:57AM -0800, Ankur Arora wrote:
> 
> Will Deacon <will at kernel.org> writes:
> 
> > On Sun, Dec 14, 2025 at 08:49:11PM -0800, Ankur Arora wrote:
> >> +#define __CMPWAIT_CASE(w, sfx, sz)						\
> >> +static inline void __cmpwait_case_##sz(volatile void *ptr,			\
> >> +				       unsigned long val,			\
> >> +				       s64 timeout_ns)				\
> >> +{										\
> >> +	unsigned long tmp;							\
> >> +										\
> >> +	if (!alternative_has_cap_unlikely(ARM64_HAS_WFXT) || timeout_ns <= 0) {	\
> >> +		asm volatile(							\
> >> +		"	sevl\n"							\
> >> +		"	wfe\n"							\
> >> +		"	ldxr" #sfx "\t%" #w "[tmp], %[v]\n"			\
> >> +		"	eor	%" #w "[tmp], %" #w "[tmp], %" #w "[val]\n"	\
> >> +		"	cbnz	%" #w "[tmp], 1f\n"				\
> >> +		"	wfe\n"							\
> >> +		"1:"								\
> >> +		: [tmp] "=&r" (tmp), [v] "+Q" (*(u##sz *)ptr)			\
> >> +		: [val] "r" (val));						\
> >> +	} else {								\
> >> +		u64 ecycles = arch_timer_read_counter() +			\
> >> +				NSECS_TO_CYCLES(timeout_ns);			\
> >> +		asm volatile(							\
> >> +		"	sevl\n"							\
> >> +		"	wfe\n"							\
> >> +		"	ldxr" #sfx "\t%" #w "[tmp], %[v]\n"			\
> >> +		"	eor	%" #w "[tmp], %" #w "[tmp], %" #w "[val]\n"	\
> >> +		"	cbnz	%" #w "[tmp], 2f\n"				\
> >> +		"	msr s0_3_c1_c0_0, %[ecycles]\n"				\
> >> +		"2:"								\
> >> +		: [tmp] "=&r" (tmp), [v] "+Q" (*(u##sz *)ptr)			\
> >> +		: [val] "r" (val), [ecycles] "r" (ecycles));			\
> >> +	}									\
> >
> > Why not have a separate helper for the WFXT version and avoid the runtime
> > check on timeout_ns?
> 
> My main reason for keeping them together was that a separate helper
> needed duplication of a lot of the __CMPWAIT_CASE and __CMPWAIT_GEN
> stuff.
> 
> Relooking at it, I think we could get by without duplicating the
> __CMPWAIT_GEN (the WFE helper just needs to take an unused timeout_ns
> paramter).
> 
> But, it seems overkill to get rid of the unnecessary check on timeout_ns
> (which AFAICT should be well predicted) and the duplicate static branch.

tbh, I was actually struggling to see what the check achieves. In fact,
why is 'timeout_ns' signed in the first place? Has BPF invented time
travel now? :p

If the requested timeout is 0 then we should return immediately (or issue
a WFET which will wake up immediately).

If the requested timeout is negative, then WFET may end up with a really
long timeout, but that should still be no worse than a plain WFE.

If the check is only there to de-multiplex __cmpwait() vs
__cmpwait_relaxed_timeout() as the caller, then I think we should just
avoid muxing them in the first place. The duplication argument doesn't
hold a lot of water when the asm block is already mostly copy-paste.

Will



More information about the linux-arm-kernel mailing list