[PATCH v2 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
Yicong Yang
yangyicong at huawei.com
Fri Sep 19 03:27:54 PDT 2025
On 2025/9/19 17:16, Mark Rutland wrote:
> On Fri, Sep 19, 2025 at 04:56:18PM +0800, Yicong Yang wrote:
>> On 2025/9/18 21:32, Will Deacon wrote:
>>> On Wed, Aug 20, 2025 at 04:45:34PM +0800, Yicong Yang wrote:
>
>>>> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
>>>> index 5c310e803dd7..137ef55d6973 100644
>>>> --- a/drivers/perf/arm_pmu.c
>>>> +++ b/drivers/perf/arm_pmu.c
>>>> @@ -901,6 +901,9 @@ struct arm_pmu *armpmu_alloc(void)
>>>>
>>>> events = per_cpu_ptr(pmu->hw_events, cpu);
>>>> events->percpu_pmu = pmu;
>>>> +
>>>> + if (!pmu->has_smt && topology_core_has_smt(cpu))
>>>> + pmu->has_smt = true;
>>>
>>> Why isn't that just:
>>>
>>> pmu->has_smt = topology_core_has_smt(cpu);
>>>
>>> ?
>>
>> also works. since one pmu only contains one type of CPU, so just thought
>> no need to set it multiple times.
>>
>>> but then if that's the case, why do we need to stash the result in the
>>> PMU at all?
>>
>> should based on the discussion here [1]. stash it during probe will avoid
>> calling {raw_}smp_processor_id() in pmu::event_init() which may be
>> horrible for debug in some condition.
>>
>> [1] https://lore.kernel.org/linux-arm-kernel/aJsV7nzlILHd_ZMa@J2N7QTR9R3/
>
> This isn't about being 'horrible for debug'; my comment there was saying
> that the proposed patch was incorrect AND it would be horrible to debug
> that in practice when it inevitably went wrong.
>
> The key details are:
>
> (1) We need pmu::event_init() to know whether the cycle counter can be
> used such that it doesn't permit a group to be created which can
> *NEVER* be scheduled in hardware. Otherwise, the core perf code will
> waste time periodically trying to schedule that group when it will
> *ALWAYS* be rejected by pmu::add().
>
> (2) The pmu::event_init() call runs in a preemptible context and can
> run on any CPU in the system, completely independent of the PMU's
> supported CPUs. Thus [raw_]smp_processor_id() tells you nothing
> about the CPU(s) the event will run on.
>
> Note that for task-bound events, the event->cpu is -1, so that
> doesn't tell us either. Only the PMU instance tells us the set of
> CPUs.
>
yes this is the problem in the last approach using [raw_]smp_processor_id()
in pmu::event_init().. sorry for the wrong information replied above and
thanks for help me recall this..
> We can solve that by either stashing this boolean flag at probe time OR
> having pmu::event_init() check something like:
>
> topology_core_has_smt(cpumask_first(pmu->supported_cpus));
>
this works. I didn't think of this approach... pmu->supported_cpus may contain
offline CPUs but it doesn't matter since topology_core_has_smt() can also
retieve the SMT implementation for offline CPU.
> ... and I think stashing at probe time is nicer/clearer.
>
I feel similar. will wait for Will's comments :)
thanks.
More information about the linux-arm-kernel
mailing list