[PATCH bpf-next 3/3] riscv, bpf: Add 32 bit atomic operations to RV32 JIT

Wed Apr 29 06:13:29 PDT 2026

> diff --git a/arch/riscv/net/bpf_jit_comp32.c b/arch/riscv/net/bpf_jit_comp32.c
> index f8509950fed4..710c56b20f98 100644
> --- a/arch/riscv/net/bpf_jit_comp32.c
> +++ b/arch/riscv/net/bpf_jit_comp32.c

[ ... ]

> @@ -902,11 +902,40 @@ static int emit_store_r64(const s8 *dst, const s8 *src, s16 off,
>  		case BPF_MEM:
>  			emit(rv_sw(RV_REG_T0, 0, lo(rs)), ctx);
>  			break;
> -		case BPF_ATOMIC: /* Only BPF_ADD supported */
> -			emit(rv_amoadd_w(RV_REG_ZERO, lo(rs), RV_REG_T0, 0, 0),
> -			     ctx);
> +		case BPF_ATOMIC:
> +		{
> +			s8 fetch_reg = (imm & BPF_FETCH) ? lo(rs) : RV_REG_ZERO;
> +
> +			switch (imm) {
> +			case BPF_ADD:
> +			case BPF_ADD | BPF_FETCH:
> +				emit(rv_amoadd_w(fetch_reg, lo(rs), RV_REG_T0, 0, 0), ctx);
> +				break;
> +			case BPF_AND:
> +			case BPF_AND | BPF_FETCH:
> +				emit(rv_amoand_w(fetch_reg, lo(rs), RV_REG_T0, 0, 0), ctx);
> +				break;
> +			case BPF_OR:
> +			case BPF_OR | BPF_FETCH:
> +				emit(rv_amoor_w(fetch_reg, lo(rs), RV_REG_T0, 0, 0), ctx);
> +				break;
> +			case BPF_XOR:
> +			case BPF_XOR | BPF_FETCH:
> +				emit(rv_amoxor_w(fetch_reg, lo(rs), RV_REG_T0, 0, 0), ctx);
> +				break;
> +			case BPF_XCHG:
> +				emit(rv_amoswap_w(fetch_reg, lo(rs), RV_REG_T0, 0, 0), ctx);
> +				break;

Do the aq=0, rl=0 arguments provide the correct memory ordering for the
FETCH variants and XCHG?

Looking at the BPF specification in include/linux/filter.h, these BPF
atomic operations map to Linux kernel atomic_fetch_* and atomic_xchg
primitives, which require full memory ordering.

The RV64 BPF JIT in arch/riscv/net/bpf_jit_comp64.c emits these same
operations with aq=1, rl=1 for FETCH variants and XCHG:

  case BPF_ADD | BPF_FETCH:
      emit(is64 ? rv_amoadd_d(rs, rs, rd, 1, 1) :
           rv_amoadd_w(rs, rs, rd, 1, 1), ctx);

  case BPF_XCHG:
      emit(is64 ? rv_amoswap_d(rs, rs, rd, 1, 1) :
           rv_amoswap_w(rs, rs, rd, 1, 1), ctx);

The kernel's arch_atomic_fetch_add in arch/riscv/include/asm/atomic.h also
uses amoadd.w.aqrl (which corresponds to aq=1, rl=1).

With aq=0, rl=0, the atomic operations use relaxed memory ordering, which
could allow loads and stores to move across the atomic. Could this cause
BPF programs using atomic_fetch_*/atomic_xchg on RV32 to observe
memory-ordering violations that wouldn't occur on other architectures?

Note: For the non-FETCH variants of BPF_ADD/AND/OR/XOR (the return-void
operations), aq=0, rl=0 appears correct since those map to relaxed
atomic primitives.

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25109627626