[PATCH v3] locking/osq: Use optimized spinning loop for arm64

Will Deacon will at kernel.org
Mon Jan 13 08:48:29 PST 2020


On Mon, Jan 13, 2020 at 10:07:35AM -0500, Waiman Long wrote:
> Arm64 has a more optimized spinning loop (atomic_cond_read_acquire)
> using wfe for spinlock that can boost performance of sibling threads
> by putting the current cpu to a wait state that is broken only when
> the monitored variable changes or an external event happens.
> 
> OSQ has a more complicated spinning loop. Besides the lock value, it
> also checks for need_resched() and vcpu_is_preempted(). The check for
> need_resched() is not a problem as it is only set by the tick interrupt
> handler. That will be detected by the spinning cpu right after iret.
> 
> The vcpu_is_preempted() check, however, is a problem as changes to the
> preempt state of of previous node will not affect the wait state. For
> ARM64, vcpu_is_preempted is not currently defined and so is a no-op.
> Will has indicated that he is planning to para-virtualize wfe instead
> of defining vcpu_is_preempted for PV support. So just add a comment in
> arch/arm64/include/asm/spinlock.h to indicate that vcpu_is_preempted()
> should not be defined as suggested.
> 
> On a 2-socket 56-core 224-thread ARM64 system, a kernel mutex locking
> microbenchmark was run for 10s with and without the patch. The
> performance numbers before patch were:
> 
> Running locktest with mutex [runtime = 10s, load = 1]
> Threads = 224, Min/Mean/Max = 316/123,143/2,121,269
> Threads = 224, Total Rate = 2,757 kop/s; Percpu Rate = 12 kop/s
> 
> After patch, the numbers were:
> 
> Running locktest with mutex [runtime = 10s, load = 1]
> Threads = 224, Min/Mean/Max = 334/147,836/1,304,787
> Threads = 224, Total Rate = 3,311 kop/s; Percpu Rate = 15 kop/s
> 
> So there was about 20% performance improvement.
> 
> Signed-off-by: Waiman Long <longman at redhat.com>
> ---
>  arch/arm64/include/asm/spinlock.h |  9 +++++++++
>  kernel/locking/osq_lock.c         | 17 ++++-------------
>  2 files changed, 13 insertions(+), 13 deletions(-)

Acked-by: Will Deacon <will at kernel.org>

Thanks,

Will



More information about the linux-arm-kernel mailing list