[PATCH bpf-next 2/2] bpf: arm64: Optimize recursion detection by not using atomics

Wed Dec 17 10:47:38 PST 2025

On Wed, Dec 17, 2025 at 10:44 AM Puranjay Mohan <puranjay at kernel.org> wrote:
>
> On Wed, Dec 17, 2025 at 6:24 PM Yonghong Song <yonghong.song at linux.dev> wrote:
> >
> >
> >
> > On 12/17/25 8:56 AM, bot+bpf-ci at kernel.org wrote:
> > >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > >> index 2da986136d26..654fb94bf60c 100644
> > >> --- a/include/linux/bpf.h
> > >> +++ b/include/linux/bpf.h
> > > [ ... ]
> > >
> > >> @@ -2006,12 +2009,32 @@ struct bpf_struct_ops_common_value {
> > >>
> > >>   static inline bool bpf_prog_get_recursion_context(struct bpf_prog *prog)
> > >>   {
> > >> +#ifdef CONFIG_ARM64
> > >> +    u8 rctx = interrupt_context_level();
> > >> +    u8 *active = this_cpu_ptr(prog->active);
> > >> +
> > >> +    active[rctx]++;
> > >> +    barrier();
> > >> +    if (get_unaligned_le32(active) != BIT(rctx * 8))
> > >> +            return false;
> > >> +
> > >> +    return true;
> > >> +#else
> > >>      return this_cpu_inc_return(*(prog->active)) == 1;
> > >> +#endif
> > >>   }
> > > Can preemption between the increment and check cause a counter leak on
> > > CONFIG_PREEMPT_RCU systems? The trampoline calls this function under
> > > rcu_read_lock_dont_migrate() which allows preemption on PREEMPT_RCU
> > > (documented at include/linux/rcupdate.h:856).
> > >
> > > Consider this scenario on an ARM64 system with PREEMPT_RCU:
> > >
> > > 1. Thread A increments active[0] to 1
> > > 2. Preemption occurs before Thread A reaches the check
> > > 3. Thread B on same CPU increments active[0] to 2
> > > 4. Thread B checks: sees 2 != BIT(0), returns false
> > > 5. Thread A resumes, checks: sees 2 != BIT(0), returns false
> > > 6. Both threads return false, neither runs BPF
> > > 7. Neither calls bpf_prog_put_recursion_context() (see
> > >     __bpf_prog_enter_recur() at kernel/bpf/trampoline.c:952)
> > > 8. Counter permanently stuck at 2, all future BPF on this CPU fails
> > >
> > > The old atomic code handled this correctly because this_cpu_inc_return()
> > > completes atomically, ensuring Thread A reads the value 1 before Thread B
> > > can interfere. With non-atomic operations, Thread A increments but hasn't
> > > read the value yet, allowing Thread B to interfere and cause both threads
> > > to see the modified value.
> > >
> > > Is there a guarantee that same-context preemption cannot occur in the BPF
> > > execution path on ARM64, or does this need additional synchronization?
> >
> > AI is correct here. See below for another thread:
> >    https://lore.kernel.org/bpf/20251217093326.1745307-1-chen.dylane@linux.dev/T/#m906fd4502fbbedd4609c586122a393363003312a
> > where preempt_disable is necessary to prevent the above scenario.
>
> See my other reply, the above scenario presented by AI is wrong
> because step 7 is wrong.

yep. preempt_disable is not necessary here. perf buffers being
reused is a different issue.