[PATCH v2 1/7] perf: Increase the maximum number of samples to 256.
Rajnesh Kanwal
rkanwal at rivosinc.com
Thu Apr 17 05:51:43 PDT 2025
+ Adding Andi Kleen.
On Thu, Feb 20, 2025 at 6:51 PM Ian Rogers <irogers at google.com> wrote:
>
> On Thu, Jan 16, 2025 at 3:10 PM Rajnesh Kanwal <rkanwal at rivosinc.com> wrote:
> >
> > RISCV CTR extension support a maximum depth of 256 last branch records.
> > The 127 entries limit results in corrupting CTR entries for RISC-V if
> > configured to be 256 entries. This will not impact any other architectures
> > as it is just increasing maximum limit of possible entries.
>
> I wonder if rather than a constant this code should just use the auto
> resizing hashmap code?
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/hashmap.h
>
> I assume the value of 127 comes from perf_event.h's PERF_MAX_STACK_DEPTH:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/perf_event.h#n1252
>
> Perhaps these constants shouldn't exist. The perf-record man page mentions:
> sysctl.kernel.perf_event_max_stack
> which I believe gets a value from
> /proc/sys/kernel/perf_event_max_stack, so maybe these should be
> runtime determined constants rather than compile time.
>
Thanks Ian for your feedback. I am not sure if it's feasible to use auto
resizing hashmap here. On each sample of 256 entries we will be doing
6 callocs and transferring a whole lot of entries in hashmap_grow. We
can't reuse old hashmap as well. On each sample we bear the same cost
But I do agree this should be more dynamic but the maximum number
of entries remove_loops can process is limited by the type of chash array
here. I can change it and related logic to use uint16_t or higher but we
will still have a cap on the number of entries.
PERF_MAX_BRANCH_DEPTH seems to be denoting what remove_loops
can process. This is being used by thread__resolve_callchain_sample to
check if the sample is processable before calling remove_loops. I think
this can't be changed to use perf_event_max_stack. But I can rename
this macro to avoid confusion.
I didn't notice PERF_MAX_STACK_DEPTH. This seems to be defined in
multiple places and touches bpf as well. I agree that we should avoid
using this macro and use runtime determined value instead. Tbh I don't
have super in-depth perf understanding. I will give this a try and send a
patch in the next update. It would be helpful if you can review it.
Thanks
-Rajnesh
> Thanks,
> Ian
>
> > Signed-off-by: Rajnesh Kanwal <rkanwal at rivosinc.com>
> > ---
> > tools/perf/util/machine.c | 21 ++++++++++++++-------
> > 1 file changed, 14 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 27d5345d2b30..f2eb3c20274e 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -2174,25 +2174,32 @@ static void save_iterations(struct iterations *iter,
> > iter->cycles += be[i].flags.cycles;
> > }
> >
> > -#define CHASHSZ 127
> > -#define CHASHBITS 7
> > -#define NO_ENTRY 0xff
> > +#define CHASHBITS 8
> > +#define NO_ENTRY 0xffU
> >
> > -#define PERF_MAX_BRANCH_DEPTH 127
> > +#define PERF_MAX_BRANCH_DEPTH 256
> >
> > /* Remove loops. */
> > +/* Note: Last entry (i==ff) will never be checked against NO_ENTRY
> > + * so it's safe to have an unsigned char array to process 256 entries
> > + * without causing clash between last entry and NO_ENTRY value.
> > + */
> > static int remove_loops(struct branch_entry *l, int nr,
> > struct iterations *iter)
> > {
> > int i, j, off;
> > - unsigned char chash[CHASHSZ];
> > + unsigned char chash[PERF_MAX_BRANCH_DEPTH];
> >
> > memset(chash, NO_ENTRY, sizeof(chash));
> >
> > - BUG_ON(PERF_MAX_BRANCH_DEPTH > 255);
> > + BUG_ON(PERF_MAX_BRANCH_DEPTH > 256);
> >
> > for (i = 0; i < nr; i++) {
> > - int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ;
> > + /* Remainder division by PERF_MAX_BRANCH_DEPTH is not
> > + * needed as hash_64 will anyway limit the hash
> > + * to CHASHBITS
> > + */
> > + int h = hash_64(l[i].from, CHASHBITS);
> >
> > /* no collision handling for now */
> > if (chash[h] == NO_ENTRY) {
> > --
> > 2.34.1
> >
More information about the linux-riscv
mailing list