[PATCH] perf vendor events arm64: Add Tegra410 Olympus PMU events

Ian Rogers irogers at google.com
Fri Jan 30 12:07:31 PST 2026


On Fri, Jan 30, 2026 at 10:08 AM Besar Wicaksono <bwicaksono at nvidia.com> wrote:
>
> Thank you James and Ian for the comments.
> I will try to address the spelling mistakes on v2.
>
> Please see my other comments inline.
>
> > -----Original Message-----
> > From: James Clark <james.clark at linaro.org>
> > Sent: Wednesday, January 28, 2026 3:37 AM
> > To: Ian Rogers <irogers at google.com>; Besar Wicaksono
> > <bwicaksono at nvidia.com>
> > Cc: john.g.garry at oracle.com; will at kernel.org; mike.leach at linaro.org;
> > leo.yan at linux.dev; mark.rutland at arm.com;
> > alexander.shishkin at linux.intel.com; jolsa at kernel.org;
> > adrian.hunter at intel.com; peterz at infradead.org; mingo at redhat.com;
> > acme at kernel.org; namhyung at kernel.org; linux-tegra at vger.kernel.org; linux-
> > arm-kernel at lists.infradead.org; linux-perf-users at vger.kernel.org; linux-
> > kernel at vger.kernel.org; Thomas Makin <tmakin at nvidia.com>; Vikram Sethi
> > <vsethi at nvidia.com>; Rich Wiley <rwiley at nvidia.com>; Sean Kelley
> > <skelley at nvidia.com>; Yifei Wan <ywan at nvidia.com>; Thierry Reding
> > <treding at nvidia.com>; Jon Hunter <jonathanh at nvidia.com>; Matt Ochs
> > <mochs at nvidia.com>
> > Subject: Re: [PATCH] perf vendor events arm64: Add Tegra410 Olympus PMU
> > events
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On 28/01/2026 8:03 am, Ian Rogers wrote:
> > > On Tue, Jan 27, 2026 at 3:00 PM Besar Wicaksono
> > <bwicaksono at nvidia.com> wrote:
> > >>
> > >> Add JSON files for NVIDIA Tegra410 Olympus core PMU events.
> > >> Also updated the common-and-microarch.json.
> > >>
> > >> Signed-off-by: Besar Wicaksono <bwicaksono at nvidia.com>
> > >> ---
> > >>   .../arch/arm64/common-and-microarch.json      |  90 +++
> > >>   tools/perf/pmu-events/arch/arm64/mapfile.csv  |   1 +
> > >>   .../arch/arm64/nvidia/t410/branch.json        |  45 ++
> > >>   .../arch/arm64/nvidia/t410/brbe.json          |   6 +
> > >>   .../arch/arm64/nvidia/t410/bus.json           |  48 ++
> > >>   .../arch/arm64/nvidia/t410/exception.json     |  62 ++
> > >>   .../arch/arm64/nvidia/t410/fp_operation.json  |  78 ++
> > >>   .../arch/arm64/nvidia/t410/general.json       |  15 +
> > >>   .../arch/arm64/nvidia/t410/l1d_cache.json     | 122 +++
> > >>   .../arch/arm64/nvidia/t410/l1i_cache.json     | 114 +++
> > >>   .../arch/arm64/nvidia/t410/l2d_cache.json     | 134 ++++
> > >>   .../arch/arm64/nvidia/t410/ll_cache.json      | 107 +++
> > >>   .../arch/arm64/nvidia/t410/memory.json        |  46 ++
> > >>   .../arch/arm64/nvidia/t410/metrics.json       | 722
> > ++++++++++++++++++
> > >>   .../arch/arm64/nvidia/t410/misc.json          | 646 ++++++++++++++++
> > >>   .../arch/arm64/nvidia/t410/retired.json       |  94 +++
> > >>   .../arch/arm64/nvidia/t410/spe.json           |  42 +
> > >>   .../arm64/nvidia/t410/spec_operation.json     | 230 ++++++
> > >>   .../arch/arm64/nvidia/t410/stall.json         | 145 ++++
> > >>   .../arch/arm64/nvidia/t410/tlb.json           | 158 ++++
> > >>   20 files changed, 2905 insertions(+)
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/branch.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/brbe.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/bus.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/exception.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/fp_operation.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/general.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/l1d_cache.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/l1i_cache.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/l2d_cache.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/ll_cache.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/memory.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/metrics.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/misc.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/retired.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/spe.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/spec_operation.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/stall.json
> > >>   create mode 100644 tools/perf/pmu-
> > events/arch/arm64/nvidia/t410/tlb.json
> > >>
> > >> diff --git a/tools/perf/pmu-events/arch/arm64/common-and-
> > microarch.json b/tools/perf/pmu-events/arch/arm64/common-and-
> > microarch.json
> > >> index 468cb085d879..6af15776ff17 100644
> > >> --- a/tools/perf/pmu-events/arch/arm64/common-and-microarch.json
> > >> +++ b/tools/perf/pmu-events/arch/arm64/common-and-microarch.json
> > >> @@ -179,6 +179,11 @@
> > >>           "EventName": "BUS_CYCLES",
> > >>           "BriefDescription": "Bus cycle"
> > >>       },
> > >> +    {
> > >> +        "EventCode": "0x001E",
> > >> +        "EventName": "CHAIN",
> > >> +        "BriefDescription": "Chain a pair of event counters."
> > >> +    },
> > >
> > > Cool stuff :-)
> > >
> > > For wider counters AMD does something similar, but should this be an
> > > implementation detail rather than an exposed event? How does it
> > > operate as an event?
> > >
> >
> > CHAIN should be excluded from the json, it's used internally by the
> > driver when you want 64 bit counters. Userspace can't use it because it
> > can't control where counters are placed to make sure they're adjacent.
> >
>
> Sure, will address this in V2.
>
> > > [snip]
> > >> diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv
> > b/tools/perf/pmu-events/arch/arm64/mapfile.csv
> > >> index bb3fa8a33496..7f0eaa702048 100644
> > >> --- a/tools/perf/pmu-events/arch/arm64/mapfile.csv
> > >> +++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv
> > >> @@ -46,3 +46,4 @@
> > >>   0x00000000500f0000,v1,ampere/emag,core
> > >>   0x00000000c00fac30,v1,ampere/ampereone,core
> > >>   0x00000000c00fac40,v1,ampere/ampereonex,core
> > >> +0x000000004e0f0100,v1,nvidia/t410,core
> > >> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/branch.json
> > b/tools/perf/pmu-events/arch/arm64/nvidia/t410/branch.json
> > >> new file mode 100644
> > >> index 000000000000..532bc59dc573
> > >> --- /dev/null
> > >> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/branch.json
> > >> @@ -0,0 +1,45 @@
> > >> +[
> > >> +    {
> > >> +        "ArchStdEvent": "BR_MIS_PRED",
> > >> +        "PublicDescription": "The Event counts Branches which are
> > speculatively executed and mis-predicted."
> > >
> > > nit: The capitalization on Event and Branches, as well as other words,
> > > is a little unusual.
> > >
> >
> > If there's nothing specific to this CPU then the public description
> > could be left out entierly. The common strings already say the same
> > thing as this:
> >
> >      {
> >          "PublicDescription": "Mispredicted or not predicted branch
> >                                speculatively executed",
> >          "EventCode": "0x10",
> >          "EventName": "BR_MIS_PRED",
> >          "BriefDescription": "Mispredicted or not predicted branch
> >                               speculatively executed"
> >      },
> >
> >
>
> I will check on this and other events.
>
> > > [snip]
> > >> diff --git a/tools/perf/pmu-events/arch/arm64/nvidia/t410/metrics.json
> > b/tools/perf/pmu-events/arch/arm64/nvidia/t410/metrics.json
> > >> new file mode 100644
> > >> index 000000000000..18c2fd58ee9e
> > >> --- /dev/null
> > >> +++ b/tools/perf/pmu-events/arch/arm64/nvidia/t410/metrics.json
> > >> @@ -0,0 +1,722 @@
> > >> +[
> > >> +    {
> > >> +        "MetricName": "backend_bound",
> > >> +        "MetricExpr": "100 * (STALL_SLOT_BACKEND / CPU_SLOT)",
> > >> +        "BriefDescription": "This metric is the percentage of total slots that
> > were stalled due to resource constraints in the backend of the processor.",
> > >> +        "ScaleUnit": "1percent of slots",
> > >> +        "MetricGroup": "TopdownL1"
> > >
> > > Note, on x86 we place TopdownL1 in the Default metric group so it is
> > > shown by `perf stat` when it isn't given events or metrics to compute.
> > > Perhaps you want to do something similar?
> > > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-
> > next.git/tree/tools/perf/pmu-events/arch/x86/icelake/icl-
> > metrics.json?h=tmp.perf-tools-next#n116
> > >
> > > Thanks,
> > > Ian
> >
> > That's currently true for the Arm topdown metrics in sbsa.json, but we
> > probably don't actually want this because there aren't enough counters
> > for the perf stat defaults plus topdown and it results in multiplexing
> > and bad results.
> >
> > I was planning to change it but we also don't currently expose the
> > number of counters available either...
> >
>
> Agree with James, we may not be able to cover all the events in
> Default + TopdownL1 group.
>
> Ian, James, in general, is it fine to put some metrics in a same group because
> they are actually correlating even though it may cause multiplexing?
> For example, "Miss_Ratio" group in this patch provides miss ratio from L1, L2, TLB, etc.
> It's convenient to have this and get the miss ratio from all cache levels.
> There are many events needed for this group and it might not be accurate due to
> multiplexing, but user can get a (rough) broad view in the beginning.

These things are strings so there's no real rules for how they get
organized other than Default being a special name. For Intel topdown
there's a convention of having a _group as a suffix to allow some kind
of drill down. There's even some support in ilist.py for this:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/python/ilist.py?h=perf-tools-next#n477
As writing json is painful there's code for generating metrics from
Python and that tries to structure metrics inside groups of groups:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/intel_metrics.py?h=perf-tools-next#n264
My hope is that at some point this will allow some kind of pretty
printing of headers and the like.

Thanks,
Ian

> Thanks,
> Besar



More information about the linux-arm-kernel mailing list