[PATCH bpf-next 2/2] bpf: arm64: Optimize recursion detection by not using atomics
Yonghong Song
yonghong.song at linux.dev
Wed Dec 17 10:23:59 PST 2025
On 12/17/25 8:56 AM, bot+bpf-ci at kernel.org wrote:
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 2da986136d26..654fb94bf60c 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
> [ ... ]
>
>> @@ -2006,12 +2009,32 @@ struct bpf_struct_ops_common_value {
>>
>> static inline bool bpf_prog_get_recursion_context(struct bpf_prog *prog)
>> {
>> +#ifdef CONFIG_ARM64
>> + u8 rctx = interrupt_context_level();
>> + u8 *active = this_cpu_ptr(prog->active);
>> +
>> + active[rctx]++;
>> + barrier();
>> + if (get_unaligned_le32(active) != BIT(rctx * 8))
>> + return false;
>> +
>> + return true;
>> +#else
>> return this_cpu_inc_return(*(prog->active)) == 1;
>> +#endif
>> }
> Can preemption between the increment and check cause a counter leak on
> CONFIG_PREEMPT_RCU systems? The trampoline calls this function under
> rcu_read_lock_dont_migrate() which allows preemption on PREEMPT_RCU
> (documented at include/linux/rcupdate.h:856).
>
> Consider this scenario on an ARM64 system with PREEMPT_RCU:
>
> 1. Thread A increments active[0] to 1
> 2. Preemption occurs before Thread A reaches the check
> 3. Thread B on same CPU increments active[0] to 2
> 4. Thread B checks: sees 2 != BIT(0), returns false
> 5. Thread A resumes, checks: sees 2 != BIT(0), returns false
> 6. Both threads return false, neither runs BPF
> 7. Neither calls bpf_prog_put_recursion_context() (see
> __bpf_prog_enter_recur() at kernel/bpf/trampoline.c:952)
> 8. Counter permanently stuck at 2, all future BPF on this CPU fails
>
> The old atomic code handled this correctly because this_cpu_inc_return()
> completes atomically, ensuring Thread A reads the value 1 before Thread B
> can interfere. With non-atomic operations, Thread A increments but hasn't
> read the value yet, allowing Thread B to interfere and cause both threads
> to see the modified value.
>
> Is there a guarantee that same-context preemption cannot occur in the BPF
> execution path on ARM64, or does this need additional synchronization?
AI is correct here. See below for another thread:
https://lore.kernel.org/bpf/20251217093326.1745307-1-chen.dylane@linux.dev/T/#m906fd4502fbbedd4609c586122a393363003312a
where preempt_disable is necessary to prevent the above scenario.
But adding preempt_disable may impact the overall system-level performance.
Does this patch can improve performance for *all* ARM64 cpu versions?
Do you have numbers to show how much performance improvement?
>
>> @@ -2006,12 +2009,32 @@ struct bpf_struct_ops_common_value {
>>
>> static inline void bpf_prog_put_recursion_context(struct bpf_prog *prog)
>> {
>> +#ifdef CONFIG_ARM64
>> + u8 rctx = interrupt_context_level();
>> + u8 *active = this_cpu_ptr(prog->active);
>> +
>> + barrier();
>> + active[rctx]--;
>> +#else
>> this_cpu_dec(*(prog->active));
>> +#endif
>> }
[...]
More information about the linux-arm-kernel
mailing list