[PATCH v8 04/12] arm64: support WFET in smp_cond_relaxed_timeout()
Ankur Arora
ankur.a.arora at oracle.com
Tue Jan 20 14:49:58 PST 2026
Will Deacon <will at kernel.org> writes:
> On Fri, Jan 09, 2026 at 11:05:06AM -0800, Ankur Arora wrote:
>>
>> Will Deacon <will at kernel.org> writes:
>>
>> > On Sun, Dec 14, 2025 at 08:49:11PM -0800, Ankur Arora wrote:
>> >> Extend __cmpwait_relaxed() to __cmpwait_relaxed_timeout() which takes
>> >> an additional timeout value in ns.
>> >>
>> >> Lacking WFET, or with zero or negative value of timeout we fallback
>> >> to WFE.
>> >>
>> >> Cc: Arnd Bergmann <arnd at arndb.de>
>> >> Cc: Catalin Marinas <catalin.marinas at arm.com>
>> >> Cc: Will Deacon <will at kernel.org>
>> >> Cc: linux-arm-kernel at lists.infradead.org
>> >> Signed-off-by: Ankur Arora <ankur.a.arora at oracle.com>
>> >> ---
>> >> arch/arm64/include/asm/barrier.h | 8 ++--
>> >> arch/arm64/include/asm/cmpxchg.h | 72 ++++++++++++++++++++++----------
>> >> 2 files changed, 55 insertions(+), 25 deletions(-)
>> >
>> > Sorry, just spotted something else on this...
>> >
>> >> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
>> >> index 6190e178db51..fbd71cd4ef4e 100644
>> >> --- a/arch/arm64/include/asm/barrier.h
>> >> +++ b/arch/arm64/include/asm/barrier.h
>> >> @@ -224,8 +224,8 @@ do { \
>> >> extern bool arch_timer_evtstrm_available(void);
>> >>
>> >> /*
>> >> - * In the common case, cpu_poll_relax() sits waiting in __cmpwait_relaxed()
>> >> - * for the ptr value to change.
>> >> + * In the common case, cpu_poll_relax() sits waiting in __cmpwait_relaxed()/
>> >> + * __cmpwait_relaxed_timeout() for the ptr value to change.
>> >> *
>> >> * Since this period is reasonably long, choose SMP_TIMEOUT_POLL_COUNT
>> >> * to be 1, so smp_cond_load_{relaxed,acquire}_timeout() does a
>> >> @@ -234,7 +234,9 @@ extern bool arch_timer_evtstrm_available(void);
>> >> #define SMP_TIMEOUT_POLL_COUNT 1
>> >>
>> >> #define cpu_poll_relax(ptr, val, timeout_ns) do { \
>> >> - if (arch_timer_evtstrm_available()) \
>> >> + if (alternative_has_cap_unlikely(ARM64_HAS_WFXT)) \
>> >> + __cmpwait_relaxed_timeout(ptr, val, timeout_ns); \
>> >> + else if (arch_timer_evtstrm_available()) \
>> >> __cmpwait_relaxed(ptr, val); \
>> >
>> > Don't you want to make sure that we have the event stream available for
>> > __cmpwait_relaxed_timeout() too? Otherwise, a large timeout is going to
>> > cause problems.
>>
>> Would that help though? If called from smp_cond_load_relaxed_timeout()
>> then we would wake up and just call __cmpwait_relaxed_timeout() again.
>
> Fair enough, I can see that. Is it worth capping the maximum timeout
> like we do for udelay()?
The DELAY_CONST_MAX thing?
So, I'm not sure your concern is about the overall timeout or timeout
per WFET iteration?
For the overall limit, at least rqspinlock has a pretty large timeout
value (NSEC_PER_SEC/4).
However, it might be a good idea to attach a DELAY_CONST_MAX like limit
when using this interface -- for architectures that do not have an optimized
way of polling/define ARCH_HAS_CPU_RELAX.
(Currently only x86 defines ARCH_HAS_CPU_RELAX but I had a series which
is meant to go after this that renames it to ARCH_HAS_ OPTIMIZED_POLL
and selects it for x86 and arm64 [1].)
But that still might mean that we could have fairly long WFET iterations.
Do you forsee a problem with that?
[1] https://lore.kernel.org/lkml/20250218213337.377987-1-ankur.a.arora@oracle.com/
Thanks
--
ankur
More information about the linux-arm-kernel
mailing list