[PATCH v8 04/12] arm64: support WFET in smp_cond_relaxed_timeout()

Ankur Arora ankur.a.arora at oracle.com
Tue Jan 20 14:49:58 PST 2026


Will Deacon <will at kernel.org> writes:

> On Fri, Jan 09, 2026 at 11:05:06AM -0800, Ankur Arora wrote:
>> 
>> Will Deacon <will at kernel.org> writes:
>> 
>> > On Sun, Dec 14, 2025 at 08:49:11PM -0800, Ankur Arora wrote:
>> >> Extend __cmpwait_relaxed() to __cmpwait_relaxed_timeout() which takes
>> >> an additional timeout value in ns.
>> >>
>> >> Lacking WFET, or with zero or negative value of timeout we fallback
>> >> to WFE.
>> >>
>> >> Cc: Arnd Bergmann <arnd at arndb.de>
>> >> Cc: Catalin Marinas <catalin.marinas at arm.com>
>> >> Cc: Will Deacon <will at kernel.org>
>> >> Cc: linux-arm-kernel at lists.infradead.org
>> >> Signed-off-by: Ankur Arora <ankur.a.arora at oracle.com>
>> >> ---
>> >>  arch/arm64/include/asm/barrier.h |  8 ++--
>> >>  arch/arm64/include/asm/cmpxchg.h | 72 ++++++++++++++++++++++----------
>> >>  2 files changed, 55 insertions(+), 25 deletions(-)
>> >
>> > Sorry, just spotted something else on this...
>> >
>> >> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
>> >> index 6190e178db51..fbd71cd4ef4e 100644
>> >> --- a/arch/arm64/include/asm/barrier.h
>> >> +++ b/arch/arm64/include/asm/barrier.h
>> >> @@ -224,8 +224,8 @@ do {									\
>> >>  extern bool arch_timer_evtstrm_available(void);
>> >>
>> >>  /*
>> >> - * In the common case, cpu_poll_relax() sits waiting in __cmpwait_relaxed()
>> >> - * for the ptr value to change.
>> >> + * In the common case, cpu_poll_relax() sits waiting in __cmpwait_relaxed()/
>> >> + * __cmpwait_relaxed_timeout() for the ptr value to change.
>> >>   *
>> >>   * Since this period is reasonably long, choose SMP_TIMEOUT_POLL_COUNT
>> >>   * to be 1, so smp_cond_load_{relaxed,acquire}_timeout() does a
>> >> @@ -234,7 +234,9 @@ extern bool arch_timer_evtstrm_available(void);
>> >>  #define SMP_TIMEOUT_POLL_COUNT	1
>> >>
>> >>  #define cpu_poll_relax(ptr, val, timeout_ns) do {			\
>> >> -	if (arch_timer_evtstrm_available())				\
>> >> +	if (alternative_has_cap_unlikely(ARM64_HAS_WFXT))		\
>> >> +		__cmpwait_relaxed_timeout(ptr, val, timeout_ns);	\
>> >> +	else if (arch_timer_evtstrm_available())			\
>> >>  		__cmpwait_relaxed(ptr, val);				\
>> >
>> > Don't you want to make sure that we have the event stream available for
>> > __cmpwait_relaxed_timeout() too? Otherwise, a large timeout is going to
>> > cause problems.
>> 
>> Would that help though? If called from smp_cond_load_relaxed_timeout()
>> then we would wake up and just call __cmpwait_relaxed_timeout() again.
>
> Fair enough, I can see that. Is it worth capping the maximum timeout
> like we do for udelay()?

The DELAY_CONST_MAX thing?

So, I'm not sure your concern is about the overall timeout or timeout
per WFET iteration?

For the overall limit, at least rqspinlock has a pretty large timeout
value (NSEC_PER_SEC/4).

However, it might be a good idea to attach a DELAY_CONST_MAX like limit
when using this interface -- for architectures that do not have an optimized
way of polling/define ARCH_HAS_CPU_RELAX.

(Currently only x86 defines ARCH_HAS_CPU_RELAX but I had a series which
is meant to go after this that renames it to ARCH_HAS_ OPTIMIZED_POLL
and selects it for x86 and arm64 [1].)

But that still might mean that we could have fairly long WFET iterations.
Do you forsee a problem with that?

[1] https://lore.kernel.org/lkml/20250218213337.377987-1-ankur.a.arora@oracle.com/

Thanks
-- 
ankur



More information about the linux-arm-kernel mailing list