[PATCH V2] asm-generic: ticket-lock: Optimize arch_spin_value_unlocked

Mon Jul 31 08:16:34 PDT 2023

On 7/30/23 22:33, guoren at kernel.org wrote:
> From: Guo Ren <guoren at linux.alibaba.com>
>
> The arch_spin_value_unlocked would cause an unnecessary memory
> access to the contended value. Although it won't cause a significant
> performance gap in most architectures, the arch_spin_value_unlocked
> argument contains enough information. Thus, remove unnecessary
> atomic_read in arch_spin_value_unlocked().
>
> The caller of arch_spin_value_unlocked() could benefit from this
> change. Currently, the only caller is lockref.
>
> Signed-off-by: Guo Ren <guoren at kernel.org>
> Cc: Waiman Long <longman at redhat.com>
> Cc: David Laight <David.Laight at ACULAB.COM>
> Cc: Peter Zijlstra <peterz at infradead.org>
> Signed-off-by: Guo Ren <guoren at linux.alibaba.com>
> ---
> Changelog
> V2:
>   - Fixup commit log with Waiman advice.
>   - Add Waiman comment in the commit msg.
> ---
>   include/asm-generic/spinlock.h | 16 +++++++++-------
>   1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
> index fdfebcb050f4..90803a826ba0 100644
> --- a/include/asm-generic/spinlock.h
> +++ b/include/asm-generic/spinlock.h
> @@ -68,11 +68,18 @@ static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
>   	smp_store_release(ptr, (u16)val + 1);
>   }
>   
> +static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> +{
> +	u32 val = lock.counter;
> +
> +	return ((val >> 16) == (val & 0xffff));
> +}
> +
>   static __always_inline int arch_spin_is_locked(arch_spinlock_t *lock)
>   {
> -	u32 val = atomic_read(lock);
> +	arch_spinlock_t val = READ_ONCE(*lock);
>   
> -	return ((val >> 16) != (val & 0xffff));
> +	return !arch_spin_value_unlocked(val);
>   }
>   
>   static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
> @@ -82,11 +89,6 @@ static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
>   	return (s16)((val >> 16) - (val & 0xffff)) > 1;
>   }
>   
> -static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> -{
> -	return !arch_spin_is_locked(&lock);
> -}
> -
>   #include <asm/qrwlock.h>
>   
>   #endif /* __ASM_GENERIC_SPINLOCK_H */

I am fine with the current change. However, modern optimizing compiler 
should be able to avoid the redundant memory read anyway. So this patch 
may not have an impact from the performance point of view.

Acked-by: Waiman Long <longman at redhat.com>