[PATCH bpf 2/3] perf/arm64: Add BRBE support for bpf_get_branch_snapshot()
Puranjay Mohan
puranjay12 at gmail.com
Fri Mar 13 14:03:44 PDT 2026
On Fri, Mar 13, 2026 at 7:59 PM Puranjay Mohan <puranjay12 at gmail.com> wrote:
>
> On Fri, Mar 13, 2026 at 6:04 PM Puranjay Mohan <puranjay at kernel.org> wrote:
> >
> > Implement the perf_snapshot_branch_stack static call for ARM's Branch
> > Record Buffer Extension (BRBE), enabling the bpf_get_branch_snapshot()
> > BPF helper on ARM64.
> >
> > This is a best-effort snapshot helper intended for tracing and debugging
> > use. It favors non-invasive snapshotting over strong serialization, and
> > returns 0 whenever a clean snapshot cannot be obtained. Nested
> > invocations are not serialized; callers may observe a 0-length result
> > when a clean snapshot cannot be preserved.
> >
> > BRBE is paused before the helper does any other work to avoid recording
> > its own branches. The sysreg writes used to pause are branchless.
> > local_daif_save() blocks local exception delivery while reading the
> > buffer. If a PMU overflow raced before that point and re-enabled BRBE,
> > the helper detects the cleared PAUSED state and returns 0.
> >
> > Branch records are read using perf_entry_from_brbe_regset() without
> > event-specific filtering. The BPF program is responsible for applying
> > its own filter criteria. The BRBE buffer is invalidated after reading
> > to maintain contiguity for other consumers.
> >
> > Signed-off-by: Puranjay Mohan <puranjay at kernel.org>
> > ---
> > drivers/perf/arm_brbe.c | 70 ++++++++++++++++++++++++++++++++++++++--
> > drivers/perf/arm_brbe.h | 9 ++++++
> > drivers/perf/arm_pmuv3.c | 5 ++-
> > 3 files changed, 81 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/perf/arm_brbe.c b/drivers/perf/arm_brbe.c
> > index ba554e0c846c..db5e000b2575 100644
> > --- a/drivers/perf/arm_brbe.c
> > +++ b/drivers/perf/arm_brbe.c
> > @@ -9,6 +9,7 @@
> > #include <linux/types.h>
> > #include <linux/bitmap.h>
> > #include <linux/perf/arm_pmu.h>
> > +#include <asm/daifflags.h>
> > #include "arm_brbe.h"
> >
> > #define BRBFCR_EL1_BRANCH_FILTERS (BRBFCR_EL1_DIRECT | \
> > @@ -618,10 +619,10 @@ static bool perf_entry_from_brbe_regset(int index, struct perf_branch_entry *ent
> >
> > brbe_set_perf_entry_type(entry, brbinf);
> >
> > - if (!branch_sample_no_cycles(event))
> > + if (!event || !branch_sample_no_cycles(event))
> > entry->cycles = brbinf_get_cycles(brbinf);
> >
> > - if (!branch_sample_no_flags(event)) {
> > + if (!event || !branch_sample_no_flags(event)) {
> > /* Mispredict info is available for source only and complete branch records. */
> > if (!brbe_record_is_target_only(brbinf)) {
> > entry->mispred = brbinf_get_mispredict(brbinf);
> > @@ -803,3 +804,68 @@ void brbe_read_filtered_entries(struct perf_branch_stack *branch_stack,
> > done:
> > branch_stack->nr = nr_filtered;
> > }
> > +
> > +/*
> > + * Best-effort BRBE snapshot for BPF tracing. Pause BRBE to avoid
> > + * self-recording and return 0 if the snapshot state appears disturbed.
> > + */
> > +int arm_brbe_snapshot_branch_stack(struct perf_branch_entry *entries, unsigned int cnt)
> > +{
> > + unsigned long flags;
> > + int nr_hw, nr_banks, nr_copied = 0;
> > + u64 brbidr, brbfcr, brbcr;
> > +
> > + if (!cnt)
> > + return 0;
> > +
> > + /* Pause BRBE first to avoid recording our own branches. */
> > + brbfcr = read_sysreg_s(SYS_BRBFCR_EL1);
> > + brbcr = read_sysreg_s(SYS_BRBCR_EL1);
> > + write_sysreg_s(brbfcr | BRBFCR_EL1_PAUSED, SYS_BRBFCR_EL1);
> > + isb();
> > +
> > + /* Block local exception delivery while reading the buffer. */
> > + flags = local_daif_save();
> > +
> > + /*
> > + * A PMU overflow before local_daif_save() could have re-enabled
> > + * BRBE, clearing the PAUSED bit. Bail out.
> > + */
> > + if (!(read_sysreg_s(SYS_BRBFCR_EL1) & BRBFCR_EL1_PAUSED))
> > + goto out;
> > +
>
> The code below doesn't implement filtering, I am currently trying to
> figure out the best way to implement that by reusing
> read_branch_records() somehow.
>
So, I thought about this more and feel that we don't need to filter
for a specific event because the BPF program is not associated with a
specific event, rather it is associated with the CPU where it is
running, So bpf_get_branch_snapshot() should return the branch records
from the PMU of that CPU. Now, if there are two events on that cpu
with different branch filter types, let's say
PERF_SAMPLE_BRANCH_IND_CALL in one event and
PERF_SAMPLE_BRANCH_ANY_RETURN in another event, the perf subsystem
configures BRBE to record the union and then does per event filtering
in software (brbe_read_filtered_entries()), but the BPF program should
still return everything that was recorded on the CPU, which this patch
is doing.
More information about the linux-arm-kernel
mailing list