[PATCH] riscv/atomic.h: optimize ops with acquire/release ordering

Andrea Parri parri.andrea at gmail.com
Sun May 5 15:45:58 PDT 2024


Hi Puranjay,

On Sun, May 05, 2024 at 12:33:40PM +0000, Puranjay Mohan wrote:
> Currently, atomic ops with acquire or release ordering are implemented
> as atomic ops with relaxed ordering followed by or preceded by an
> acquire fence or a release fence.
> 
> Section 8.1 of the "The RISC-V Instruction Set Manual Volume I:
> Unprivileged ISA", titled, "Specifying Ordering of Atomic Instructions"
> says:
> 
> | To provide more efficient support for release consistency [5], each
> | atomic instruction has two bits, aq and rl, used to specify additional
> | memory ordering constraints as viewed by other RISC-V harts.
> 
> and
> 
> | If only the aq bit is set, the atomic memory operation is treated as
> | an acquire access.
> | If only the rl bit is set, the atomic memory operation is treated as a
> | release access.
> 
> So, rather than using two instructions (relaxed atomic op + fence), use
> a single atomic op instruction with acquire/release ordering.
> 
> Example program:
> 
>   atomic_t cnt = ATOMIC_INIT(0);
>   atomic_fetch_add_acquire(1, &cnt);
>   atomic_fetch_add_release(1, &cnt);
> 
> Before:
> 
>   amoadd.w        a4,a5,(a4)  // Atomic add with relaxed ordering
>   fence   r,rw                // Fence to force Acquire ordering
> 
>   fence   rw,w                // Fence to force Release ordering
>   amoadd.w        a4,a5,(a4)  // Atomic add with relaxed ordering
> 
> After:
> 
>   amoadd.w.aq     a4,a5,(a4)  // Atomic add with Acquire ordering
> 
>   amoadd.w.rl     a4,a5,(a4)  // Atomic add with Release ordering
> 
> Signed-off-by: Puranjay Mohan <puranjay at kernel.org>

Your changes are effectively partially reverting:

  5ce6c1f3535fa ("riscv/atomic: Strengthen implementations with fences")

Can you please provide (and possibly include in the changelog of v2) a more
thoughtful explanation for the correctness of such revert?

(Anticipating a somewhat non-trivial analysis...)

Have you tried your changes on some actual hardware?  How did they perform?
Anything worth mentioning (besides the mere instruction count)?

  Andrea



More information about the linux-riscv mailing list