[PATCH v2 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
Yicong Yang
yangyicong at huawei.com
Fri Sep 19 01:56:18 PDT 2025
On 2025/9/18 21:32, Will Deacon wrote:
> On Wed, Aug 20, 2025 at 04:45:34PM +0800, Yicong Yang wrote:
>> From: Yicong Yang <yangyicong at hisilicon.com>
>>
>> CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's
>> preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count
>> processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if
>> one of the SMT siblings is not idle on a multi-threaded implementation.
>> So don't use it on SMT cores.
>>
>> Introduce topology_core_has_smt() for knowing the SMT implementation and
>> cached it in arm_pmu::has_smt during allocation.
>>
>> When counting cycles on SMT CPU 2-3 and CPU 3 is idle, without this
>> patch we'll get:
>> [root at client1 tmp]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
>> --taskset 2 --timeout 1
>> [...]
>> Performance counter stats for 'CPU(s) 2-3':
>>
>> CPU2 2880457316 cycles
>> CPU3 2880459810 cycles
>> 1.254688470 seconds time elapsed
>>
>> With this patch the idle state of CPU3 is observed as expected:
>> [root at client1 ~]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
>> --taskset 2 --timeout 1
>> [...]
>> Performance counter stats for 'CPU(s) 2-3':
>>
>> CPU2 2558580492 cycles
>> CPU3 305749 cycles
>> 1.113626410 seconds time elapsed
>>
>> Signed-off-by: Yicong Yang <yangyicong at hisilicon.com>
>> ---
>> drivers/perf/arm_pmu.c | 3 +++
>> drivers/perf/arm_pmuv3.c | 10 ++++++++++
>> include/linux/arch_topology.h | 11 +++++++++++
>> include/linux/perf/arm_pmu.h | 1 +
>> 4 files changed, 25 insertions(+)
>>
>> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
>> index 5c310e803dd7..137ef55d6973 100644
>> --- a/drivers/perf/arm_pmu.c
>> +++ b/drivers/perf/arm_pmu.c
>> @@ -901,6 +901,9 @@ struct arm_pmu *armpmu_alloc(void)
>>
>> events = per_cpu_ptr(pmu->hw_events, cpu);
>> events->percpu_pmu = pmu;
>> +
>> + if (!pmu->has_smt && topology_core_has_smt(cpu))
>> + pmu->has_smt = true;
>
> Why isn't that just:
>
> pmu->has_smt = topology_core_has_smt(cpu);
>
> ?
also works. since one pmu only contains one type of CPU, so just thought
no need to set it multiple times.
>
> but then if that's the case, why do we need to stash the result in the
> PMU at all?
>
should based on the discussion here [1]. stash it during probe will avoid
calling {raw_}smp_processor_id() in pmu::event_init() which may be
horrible for debug in some condition.
[1] https://lore.kernel.org/linux-arm-kernel/aJsV7nzlILHd_ZMa@J2N7QTR9R3/
thanks.
More information about the linux-arm-kernel
mailing list