[PATCH v10 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout()

Ankur Arora ankur.a.arora at oracle.com
Mon Mar 16 15:08:07 PDT 2026


Andrew Morton <akpm at linux-foundation.org> writes:

> On Sun, 15 Mar 2026 18:36:39 -0700 Ankur Arora <ankur.a.arora at oracle.com> wrote:
>
>> Hi,
>>
>> This series adds waited variants of the smp_cond_load() primitives:
>> smp_cond_load_relaxed_timeout(), and smp_cond_load_acquire_timeout().
>>
>> ...
>>
>
> How are we to determine that this change is successful, useful, etc?

Good point. So this series was split off from this one here:
  https://lore.kernel.org/lkml/20250218213337.377987-1-ankur.a.arora@oracle.com/

The series enables ARCH_HAS_CPU_RELAX on arm64 which should allow
relatively cheap polling in idle on arm64.
However, it does need a few more patches from the series above to do that.

> Reduced CPU consumption?  Reduced energy usage?  Improved latencies?

With the additional patches this should improve wakeup latency:

  I ran the sched-pipe test with processes on VCPUs 4 and 5 with
  kvm-arm.wfi_trap_policy=notrap.

  # perf stat -r 5 --cpu 4,5 -e task-clock,cycles,instructions,sched:sched_wake_idle_without_ipi \
  perf bench sched pipe -l 1000000 -c 4

  # No haltpoll (and, no TIF_POLLING_NRFLAG):

  Performance counter stats for 'CPU(s) 4,5' (5 runs):

         25,229.57 msec task-clock                       #    2.000 CPUs utilized               ( +-  7.75% )
    45,821,250,284      cycles                           #    1.816 GHz                         ( +- 10.07% )
    26,557,496,665      instructions                     #    0.58  insn per cycle              ( +-  0.21% )
                 0      sched:sched_wake_idle_without_ipi #    0.000 /sec

       12.615 +- 0.977 seconds time elapsed  ( +-  7.75% )


  # Haltpoll:

  Performance counter stats for 'CPU(s) 4,5' (5 runs):

         15,131.58 msec task-clock                       #    2.000 CPUs utilized               ( +- 10.00% )
    34,158,188,839      cycles                           #    2.257 GHz                         ( +-  6.91% )
    20,824,950,916      instructions                     #    0.61  insn per cycle              ( +-  0.09% )
         1,983,822      sched:sched_wake_idle_without_ipi #  131.105 K/sec                       ( +-  0.78% )

        7.566 +- 0.756 seconds time elapsed  ( +- 10.00% )

  We get a decent boost just because we are executing ~20% fewer
  instructions. Not sure how the cpu frequency scaling works in a VM but
  we also run at a higher frequency.

(That specifically applies to guests but that series also adds enables this
with acpi-idle for baremetal.)

(From: https://lore.kernel.org/lkml/877c9zhk68.fsf@oracle.com/)

>> Finally update poll_idle() and resilient queued spinlocks to use them.
>
> Have you identified other suitable sites for conversion?

Haven't found other places in the core kernel where this could be used.
I think one reason is that the typical kernel wait is unbounded.

There are some in drivers/ that have this pattern. For instance I think
this in drivers/iommu/arm/arm-smmu-v3 could be converted:
__arm_smmu_cmdq_poll_until_msi().

However, as David Laight pointed out in this thread
(https://lore.kernel.org/lkml/20260214113122.70627a8b@pumpkin/)
that this would be fine so long as the polling is on memory, but would
need some work to handle MMIO.

>>  Documentation/atomic_t.txt           | 14 +++--
>>  arch/arm64/Kconfig                   |  3 +
>>  arch/arm64/include/asm/barrier.h     | 23 +++++++
>>  arch/arm64/include/asm/cmpxchg.h     | 62 +++++++++++++++----
>>  arch/arm64/include/asm/delay-const.h | 27 +++++++++
>>  arch/arm64/include/asm/rqspinlock.h  | 85 --------------------------
>>  arch/arm64/lib/delay.c               | 15 ++---
>>  drivers/cpuidle/poll_state.c         | 21 +------
>>  drivers/soc/qcom/rpmh-rsc.c          |  8 +--
>>  include/asm-generic/barrier.h        | 90 ++++++++++++++++++++++++++++
>>  include/linux/atomic.h               | 10 ++++
>>  include/linux/atomic/atomic-long.h   | 18 +++---
>>  include/linux/sched/idle.h           | 29 +++++++++
>>  kernel/bpf/rqspinlock.c              | 77 +++++++++++++++---------
>>  scripts/atomic/gen-atomic-long.sh    | 16 +++--
>>  15 files changed, 320 insertions(+), 178 deletions(-)
>>  create mode 100644 arch/arm64/include/asm/delay-const.h
>
> Some sort of testing in lib/tests/ would be appropriate and useful.

Makes sense. Will add.

Thanks
--
ankur



More information about the linux-arm-kernel mailing list