[PATCH v2 1/7] perf: Increase the maximum number of samples to 256.

Rajnesh Kanwal rkanwal at rivosinc.com
Thu Apr 17 05:51:43 PDT 2025


+ Adding Andi Kleen.

On Thu, Feb 20, 2025 at 6:51 PM Ian Rogers <irogers at google.com> wrote:
>
> On Thu, Jan 16, 2025 at 3:10 PM Rajnesh Kanwal <rkanwal at rivosinc.com> wrote:
> >
> > RISCV CTR extension support a maximum depth of 256 last branch records.
> > The 127 entries limit results in corrupting CTR entries for RISC-V if
> > configured to be 256 entries. This will not impact any other architectures
> > as it is just increasing maximum limit of possible entries.
>
> I wonder if rather than a constant this code should just use the auto
> resizing hashmap code?
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/hashmap.h
>
> I assume the value of 127 comes from perf_event.h's PERF_MAX_STACK_DEPTH:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/perf_event.h#n1252
>
> Perhaps these constants shouldn't exist. The perf-record man page mentions:
> sysctl.kernel.perf_event_max_stack
> which I believe gets a value from
> /proc/sys/kernel/perf_event_max_stack, so maybe these should be
> runtime determined constants rather than compile time.
>

Thanks Ian for your feedback. I am not sure if it's feasible to use auto
resizing hashmap here. On each sample of 256 entries we will be doing
6 callocs and transferring a whole lot of entries in hashmap_grow. We
can't reuse old hashmap as well. On each sample we bear the same cost

But I do agree this should be more dynamic but the maximum number
of entries remove_loops can process is limited by the type of chash array
here. I can change it and related logic to use uint16_t or higher but we
will still have a cap on the number of entries.

PERF_MAX_BRANCH_DEPTH seems to be denoting what remove_loops
can process. This is being used by thread__resolve_callchain_sample to
check if the sample is processable before calling remove_loops. I think
this can't be changed to use perf_event_max_stack. But I can rename
this macro to avoid confusion.

I didn't notice PERF_MAX_STACK_DEPTH. This seems to be defined in
multiple places and touches bpf as well. I agree that we should avoid
using this macro and use runtime determined value instead. Tbh I don't
have super in-depth perf understanding. I will give this a try and send a
patch in the next update. It would be helpful if you can review it.

Thanks
-Rajnesh

> Thanks,
> Ian
>
> > Signed-off-by: Rajnesh Kanwal <rkanwal at rivosinc.com>
> > ---
> >  tools/perf/util/machine.c | 21 ++++++++++++++-------
> >  1 file changed, 14 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 27d5345d2b30..f2eb3c20274e 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -2174,25 +2174,32 @@ static void save_iterations(struct iterations *iter,
> >                 iter->cycles += be[i].flags.cycles;
> >  }
> >
> > -#define CHASHSZ 127
> > -#define CHASHBITS 7
> > -#define NO_ENTRY 0xff
> > +#define CHASHBITS 8
> > +#define NO_ENTRY 0xffU
> >
> > -#define PERF_MAX_BRANCH_DEPTH 127
> > +#define PERF_MAX_BRANCH_DEPTH 256
> >
> >  /* Remove loops. */
> > +/* Note: Last entry (i==ff) will never be checked against NO_ENTRY
> > + * so it's safe to have an unsigned char array to process 256 entries
> > + * without causing clash between last entry and NO_ENTRY value.
> > + */
> >  static int remove_loops(struct branch_entry *l, int nr,
> >                         struct iterations *iter)
> >  {
> >         int i, j, off;
> > -       unsigned char chash[CHASHSZ];
> > +       unsigned char chash[PERF_MAX_BRANCH_DEPTH];
> >
> >         memset(chash, NO_ENTRY, sizeof(chash));
> >
> > -       BUG_ON(PERF_MAX_BRANCH_DEPTH > 255);
> > +       BUG_ON(PERF_MAX_BRANCH_DEPTH > 256);
> >
> >         for (i = 0; i < nr; i++) {
> > -               int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ;
> > +               /* Remainder division by PERF_MAX_BRANCH_DEPTH is not
> > +                * needed as hash_64 will anyway limit the hash
> > +                * to CHASHBITS
> > +                */
> > +               int h = hash_64(l[i].from, CHASHBITS);
> >
> >                 /* no collision handling for now */
> >                 if (chash[h] == NO_ENTRY) {
> > --
> > 2.34.1
> >



More information about the linux-riscv mailing list