[PATCH RFC] Avoid memory barrier in read_seqcount() through load acquire

Waiman Long longman at redhat.com
Tue Aug 13 12:01:36 PDT 2024


On 8/13/24 14:26, Christoph Lameter via B4 Relay wrote:
> From: "Christoph Lameter (Ampere)" <cl at gentwo.org>
>
> Some architectures support load acquire which can save us a memory
> barrier and save some cycles.
>
> A typical sequence
>
> 	do {
> 		seq = read_seqcount_begin(&s);
> 		<something>
> 	} while (read_seqcount_retry(&s, seq);
>
> requires 13 cycles on ARM64 for an empty loop. Two read memory barriers are
> needed. One for each of the seqcount_* functions.
>
> We can replace the first read barrier with a load acquire of
> the seqcount which saves us one barrier.
>
> On ARM64 doing so reduces the cycle count from 13 to 8.
>
> Signed-off-by: Christoph Lameter (Ampere) <cl at gentwo.org>
> ---
>   arch/Kconfig            |  5 +++++
>   arch/arm64/Kconfig      |  1 +
>   include/linux/seqlock.h | 41 +++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 47 insertions(+)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 975dd22a2dbd..3f8867110a57 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1600,6 +1600,11 @@ config ARCH_HAS_KERNEL_FPU_SUPPORT
>   	  Architectures that select this option can run floating-point code in
>   	  the kernel, as described in Documentation/core-api/floating-point.rst.
>   
> +config ARCH_HAS_ACQUIRE_RELEASE
> +	bool
> +	help
> +	  Architectures that support acquire / release can avoid memory fences
> +
>   source "kernel/gcov/Kconfig"
>   
>   source "scripts/gcc-plugins/Kconfig"
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a2f8ff354ca6..19e34fff145f 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -39,6 +39,7 @@ config ARM64
>   	select ARCH_HAS_PTE_DEVMAP
>   	select ARCH_HAS_PTE_SPECIAL
>   	select ARCH_HAS_HW_PTE_YOUNG
> +	select ARCH_HAS_ACQUIRE_RELEASE
>   	select ARCH_HAS_SETUP_DMA_OPS
>   	select ARCH_HAS_SET_DIRECT_MAP
>   	select ARCH_HAS_SET_MEMORY

Do we need a new ARCH flag? I believe barrier APIs like 
smp_load_acquire() will use the full barrier for those arch'es that 
don't define their own smp_load_acquire().

BTW, acquire/release can be considered memory barriers too. Maybe you 
are talking about preferring acquire/release barriers over read/write 
barriers. Right?

Cheers,
Longman




More information about the linux-arm-kernel mailing list