> -----Original Message-----
> From: James Clark <james.clark at linaro.org>
> Sent: Friday, May 1, 2026 9:02 AM
> To: Besar Wicaksono <bwicaksono at nvidia.com>
> Cc: linux-arm-kernel at lists.infradead.org; linux-kernel at vger.kernel.org; linux-
> tegra at vger.kernel.org; Thierry Reding <treding at nvidia.com>; Jon Hunter
> <jonathanh at nvidia.com>; Vikram Sethi <vsethi at nvidia.com>; Rich Wiley
> <rwiley at nvidia.com>; Shanker Donthineni <sdonthineni at nvidia.com>; Matt
> Ochs <mochs at nvidia.com>; Nirmoy Das <nirmoyd at nvidia.com>; Sean Kelley
> <skelley at nvidia.com>; will at kernel.org; mark.rutland at arm.com;
> yangyccccc at gmail.com
> Subject: Re: [PATCH v3] perf/arm_pmu: Skip PMCCNTR_EL0 on NVIDIA
> Olympus
>
> External email: Use caution opening links or attachments
>
>
> On 29/04/2026 10:56 pm, Besar Wicaksono wrote:
> > PMCCNTR_EL0 may continue to increment on NVIDIA Olympus CPUs while
> the
> > PE is in WFI/WFE. That does not necessarily match the CPU_CYCLES event
> > counted by a programmable counter, so using PMCCNTR_EL0 for cycles can
> > give results that differ from the programmable counter path.
> >
> > Extend the existing PMCCNTR avoidance decision from the SMT case to
> > also cover Olympus. Store the result in the common arm_pmu state at
> > registration time, so arm_pmuv3 can keep using a single flag when
> > deciding whether CPU_CYCLES may use PMCCNTR_EL0.
> >
> > Use the cached MIDR from cpu_data to identify Olympus parts and avoid
> > reading MIDR_EL1 in the event path.
> >
> > Signed-off-by: Besar Wicaksono <bwicaksono at nvidia.com>
> > ---
> >
> > Changes from v1:
> > * add CONFIG_ARM64 check to fix build error found by kernel test robot
> > * add explicit include of <asm/cputype.h>
> > v1: https://lore.kernel.org/linux-arm-kernel/20260406232034.2566133-1-
> bwicaksono at nvidia.com/
> >
> > Changes from v2:
> > * Move the Olympus PMCCNTR avoidance check from arm_pmuv3.c to the
> > common arm_pmu registration path.
> > * Replace the PMUv3-only has_smt flag with avoid_pmccntr, covering both
> > the existing SMT restriction and the Olympus MIDR restriction.
> > * Use the cached per-CPU MIDR from cpu_data instead of calling
> > is_midr_in_range_list() from armv8pmu_can_use_pmccntr().
> > * Add the required asm/cpu.h include for cpu_data.
> > * Drop the use_pmccntr override patch from this revision.
> > v2: https://lore.kernel.org/linux-arm-kernel/20260421203856.3539186-1-
> bwicaksono at nvidia.com/#t
> >
> > ---
> > drivers/perf/arm_pmu.c | 78
> +++++++++++++++++++++++++++++++++---
> > drivers/perf/arm_pmuv3.c | 8 +---
> > include/linux/perf/arm_pmu.h | 2 +-
> > 3 files changed, 75 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> > index 939bcbd433aa..7df185ee7b74 100644
> > --- a/drivers/perf/arm_pmu.c
> > +++ b/drivers/perf/arm_pmu.c
> > @@ -24,6 +24,8 @@
> > #include <linux/irq.h>
> > #include <linux/irqdesc.h>
> >
> > +#include <asm/cpu.h>
> > +#include <asm/cputype.h>
> > #include <asm/irq_regs.h>
> >
> > static int armpmu_count_irq_users(const struct cpumask *affinity,
> > @@ -920,6 +922,76 @@ void armpmu_free(struct arm_pmu *pmu)
> > kfree(pmu);
> > }
> >
> > +#ifdef CONFIG_ARM64
> > +/*
> > + * List of CPUs that should avoid using PMCCNTR_EL0.
> > + */
> > +static struct midr_range armpmu_avoid_pmccntr_cpus[] = {
> > + /*
> > + * The PMCCNTR_EL0 in Olympus CPU may still increment while in
> WFI/WFE state.
> > + * This is an implementation specific behavior and not an erratum.
> > + *
> > + * From ARM DDI0487 D14.4:
> > + * It is IMPLEMENTATION SPECIFIC whether CPU_CYCLES and PMCCNTR
> count
> > + * when the PE is in WFI or WFE state, even if the clocks are not stopped.
> > + *
> > + * From ARM DDI0487 D24.5.2:
> > + * All counters are subject to any changes in clock frequency, including
> > + * clock stopping caused by the WFI and WFE instructions.
> > + * This means that it is CONSTRAINED UNPREDICTABLE whether or not
> > + * PMCCNTR_EL0 continues to increment when clocks are stopped by
> WFI and
> > + * WFE instructions.
> > + */
> > + MIDR_ALL_VERSIONS(MIDR_NVIDIA_OLYMPUS),
> > + {}
> > +};
> > +
> > +static bool armpmu_is_in_avoid_pmccntr_cpus(int cpu)
> > +{
> > + struct midr_range const *r = armpmu_avoid_pmccntr_cpus;
> > + u32 midr = (u32)per_cpu(cpu_data, cpu).reg_midr;
>
> Hi Besar,
>
> This is still fragile to the thing I mentioned on V2 about some of the
> CPUs not being online, then cpu_data isn't initialized for those CPUs.
>
> Sashiko suggests to use cpumask_any_and(&pmu->supported_cpus,
> cpu_online_mask), and currently the Arm PMUs do require at least one CPU
> online so it's probably fine. Although it could be fragile if we added
> deferred probing in the future.
>
> The other alternative is to put this in __armv8pmu_probe_pmu(), although
> then you end up with both arm_pmuv3 and arm_pmu initializing
> cpu_pmu->has_smt, but I'm sure there is a way to make it fit somehow.
>
Thanks for the pointers, James and Sashiko. I will try this alternative approach
and add the check on __armv8pmu_probe_pmu(). I would still rename
has_smt to avoid_pmccntr and keep the SMT check on arm_pmu.c.
Regards,
Besar