[PATCH bpf-next 2/2] bpf: arm64: Optimize recursion detection by not using atomics
bot+bpf-ci at kernel.org
bot+bpf-ci at kernel.org
Wed Dec 17 08:56:40 PST 2025
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 2da986136d26..654fb94bf60c 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
[ ... ]
> @@ -2006,12 +2009,32 @@ struct bpf_struct_ops_common_value {
>
> static inline bool bpf_prog_get_recursion_context(struct bpf_prog *prog)
> {
> +#ifdef CONFIG_ARM64
> + u8 rctx = interrupt_context_level();
> + u8 *active = this_cpu_ptr(prog->active);
> +
> + active[rctx]++;
> + barrier();
> + if (get_unaligned_le32(active) != BIT(rctx * 8))
> + return false;
> +
> + return true;
> +#else
> return this_cpu_inc_return(*(prog->active)) == 1;
> +#endif
> }
Can preemption between the increment and check cause a counter leak on
CONFIG_PREEMPT_RCU systems? The trampoline calls this function under
rcu_read_lock_dont_migrate() which allows preemption on PREEMPT_RCU
(documented at include/linux/rcupdate.h:856).
Consider this scenario on an ARM64 system with PREEMPT_RCU:
1. Thread A increments active[0] to 1
2. Preemption occurs before Thread A reaches the check
3. Thread B on same CPU increments active[0] to 2
4. Thread B checks: sees 2 != BIT(0), returns false
5. Thread A resumes, checks: sees 2 != BIT(0), returns false
6. Both threads return false, neither runs BPF
7. Neither calls bpf_prog_put_recursion_context() (see
__bpf_prog_enter_recur() at kernel/bpf/trampoline.c:952)
8. Counter permanently stuck at 2, all future BPF on this CPU fails
The old atomic code handled this correctly because this_cpu_inc_return()
completes atomically, ensuring Thread A reads the value 1 before Thread B
can interfere. With non-atomic operations, Thread A increments but hasn't
read the value yet, allowing Thread B to interfere and cause both threads
to see the modified value.
Is there a guarantee that same-context preemption cannot occur in the BPF
execution path on ARM64, or does this need additional synchronization?
> @@ -2006,12 +2009,32 @@ struct bpf_struct_ops_common_value {
>
> static inline void bpf_prog_put_recursion_context(struct bpf_prog *prog)
> {
> +#ifdef CONFIG_ARM64
> + u8 rctx = interrupt_context_level();
> + u8 *active = this_cpu_ptr(prog->active);
> +
> + barrier();
> + active[rctx]--;
> +#else
> this_cpu_dec(*(prog->active));
> +#endif
> }
The barrier() is placed before the decrement. Does this provide the needed
ordering guarantee, or should there also be a barrier after the decrement
to ensure visibility?
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index c66316e32563..b5063acfcf92 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
[ ... ]
> @@ -112,7 +112,8 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
> vfree(fp);
> return NULL;
> }
> - fp->active = alloc_percpu_gfp(int, bpf_memcg_flags(GFP_KERNEL | gfp_extra_flags));
> + fp->active = __alloc_percpu_gfp(sizeof(u8[BPF_NR_CONTEXTS]), 8,
> + bpf_memcg_flags(GFP_KERNEL | gfp_extra_flags));
The allocation changes to u8[4] for all architectures, but the non-ARM64
code path still uses *(prog->active) which now dereferences only the first
byte. This changes the counter from int (typically 4 bytes, range ~2^31)
to u8 (1 byte, range 256). While 256 recursion levels is sufficient in
practice, is this intentional for non-ARM64 platforms, or should they
continue using a full int-sized allocation?
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/20310266487
More information about the linux-arm-kernel
mailing list