[RESEND PATCH v7 1/7] asm-generic: barrier: Add smp_cond_load_relaxed_timeout()

Arnd Bergmann arnd at arndb.de
Tue Oct 28 02:42:49 PDT 2025


On Tue, Oct 28, 2025, at 06:31, Ankur Arora wrote:

> + */
> +#ifndef smp_cond_load_relaxed_timeout
> +#define smp_cond_load_relaxed_timeout(ptr, cond_expr, time_check_expr)	\
> +({									\
> +	typeof(ptr) __PTR = (ptr);					\
> +	__unqual_scalar_typeof(*ptr) VAL;				\
> +	u32 __n = 0, __spin = SMP_TIMEOUT_POLL_COUNT;			\
> +									\
> +	for (;;) {							\
> +		VAL = READ_ONCE(*__PTR);				\
> +		if (cond_expr)						\
> +			break;						\
> +		cpu_poll_relax(__PTR, VAL);				\
> +		if (++__n < __spin)					\
> +			continue;					\
> +		if (time_check_expr) {					\
> +			VAL = READ_ONCE(*__PTR);			\
> +			break;						\
> +		}							\
> +		__n = 0;						\
> +	}								\
> +	(typeof(*ptr))VAL;						\
> +})
> +#endif

I'm trying to think of ideas for how this would done on arm64
with FEAT_FWXT in a way that doesn't hurt other architectures.

The best idea I've come up with is to change that inner loop
to combine the cpu_poll_relax() with the timecheck and then
define the 'time_check_expr' so it has to return an approximate
(ceiling) number of nanoseconds of remaining time or zero if
expired.

The FEAT_WFXT version would then look something like

static inline void __cmpwait_u64_timeout(volatile u64 *ptr, unsigned long val, __u64 ns)
{
   unsigned long tmp;
   asm volatile ("sev; wfe; ldxr; eor; cbnz; wfet; 1:"
        : "=&r" (tmp), "+Q" (*ptr)
        : "r" (val), "r" (ns));
}
#define cpu_poll_relax_timeout_wfet(__PTR, VAL, TIMECHECK) \
({                                                    \
       u64 __t = TIMECHECK;
       if (__t)
            __cmpwait_u64_timeout(__PTR, VAL, __t);
})

while the 'wfe' version would continue to do the timecheck after the
wait.

I have two lesser concerns with the generic definition here:

- having both a timeout and a spin counter in the same loop
  feels redundant and error-prone, as the behavior in practice
  would likely depend a lot on the platform. What is the reason
  for keeping the counter if we already have a fixed timeout
  condition?

- I generally dislike the type-agnostic macros like this one,
  it adds a lot of extra complexity here that I feel can be
  completely avoided if we make explicitly 32-bit and 64-bit
  wide versions of these macros. We probably won't be able
  to resolve this as part of your series, but ideally I'd like
  have explicitly-typed versions of cmpxchg(), smp_load_acquire()
  and all the related ones, the same way we do for atomic_*()
  and atomic64_*().

       Arnd



More information about the linux-arm-kernel mailing list