[PATCH v2 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
Mark Rutland
mark.rutland at arm.com
Fri Sep 19 02:16:49 PDT 2025
On Fri, Sep 19, 2025 at 04:56:18PM +0800, Yicong Yang wrote:
> On 2025/9/18 21:32, Will Deacon wrote:
> > On Wed, Aug 20, 2025 at 04:45:34PM +0800, Yicong Yang wrote:
> >> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> >> index 5c310e803dd7..137ef55d6973 100644
> >> --- a/drivers/perf/arm_pmu.c
> >> +++ b/drivers/perf/arm_pmu.c
> >> @@ -901,6 +901,9 @@ struct arm_pmu *armpmu_alloc(void)
> >>
> >> events = per_cpu_ptr(pmu->hw_events, cpu);
> >> events->percpu_pmu = pmu;
> >> +
> >> + if (!pmu->has_smt && topology_core_has_smt(cpu))
> >> + pmu->has_smt = true;
> >
> > Why isn't that just:
> >
> > pmu->has_smt = topology_core_has_smt(cpu);
> >
> > ?
>
> also works. since one pmu only contains one type of CPU, so just thought
> no need to set it multiple times.
>
> > but then if that's the case, why do we need to stash the result in the
> > PMU at all?
>
> should based on the discussion here [1]. stash it during probe will avoid
> calling {raw_}smp_processor_id() in pmu::event_init() which may be
> horrible for debug in some condition.
>
> [1] https://lore.kernel.org/linux-arm-kernel/aJsV7nzlILHd_ZMa@J2N7QTR9R3/
This isn't about being 'horrible for debug'; my comment there was saying
that the proposed patch was incorrect AND it would be horrible to debug
that in practice when it inevitably went wrong.
The key details are:
(1) We need pmu::event_init() to know whether the cycle counter can be
used such that it doesn't permit a group to be created which can
*NEVER* be scheduled in hardware. Otherwise, the core perf code will
waste time periodically trying to schedule that group when it will
*ALWAYS* be rejected by pmu::add().
(2) The pmu::event_init() call runs in a preemptible context and can
run on any CPU in the system, completely independent of the PMU's
supported CPUs. Thus [raw_]smp_processor_id() tells you nothing
about the CPU(s) the event will run on.
Note that for task-bound events, the event->cpu is -1, so that
doesn't tell us either. Only the PMU instance tells us the set of
CPUs.
We can solve that by either stashing this boolean flag at probe time OR
having pmu::event_init() check something like:
topology_core_has_smt(cpumask_first(pmu->supported_cpus));
... and I think stashing at probe time is nicer/clearer.
Mark.
More information about the linux-arm-kernel
mailing list