[PATCH v2 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores

Yicong Yang yangyicong at huawei.com
Fri Sep 19 01:56:18 PDT 2025


On 2025/9/18 21:32, Will Deacon wrote:
> On Wed, Aug 20, 2025 at 04:45:34PM +0800, Yicong Yang wrote:
>> From: Yicong Yang <yangyicong at hisilicon.com>
>>
>> CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's
>> preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count
>> processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if
>> one of the SMT siblings is not idle on a multi-threaded implementation.
>> So don't use it on SMT cores.
>>
>> Introduce topology_core_has_smt() for knowing the SMT implementation and
>> cached it in arm_pmu::has_smt during allocation.
>>
>> When counting cycles on SMT CPU 2-3 and CPU 3 is idle, without this
>> patch we'll get:
>> [root at client1 tmp]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
>> --taskset 2 --timeout 1
>> [...]
>>  Performance counter stats for 'CPU(s) 2-3':
>>
>> CPU2           2880457316      cycles
>> CPU3           2880459810      cycles
>>        1.254688470 seconds time elapsed
>>
>> With this patch the idle state of CPU3 is observed as expected:
>> [root at client1 ~]#  perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
>> --taskset 2 --timeout 1
>> [...]
>>  Performance counter stats for 'CPU(s) 2-3':
>>
>> CPU2           2558580492      cycles
>> CPU3               305749      cycles
>>        1.113626410 seconds time elapsed
>>
>> Signed-off-by: Yicong Yang <yangyicong at hisilicon.com>
>> ---
>>  drivers/perf/arm_pmu.c        |  3 +++
>>  drivers/perf/arm_pmuv3.c      | 10 ++++++++++
>>  include/linux/arch_topology.h | 11 +++++++++++
>>  include/linux/perf/arm_pmu.h  |  1 +
>>  4 files changed, 25 insertions(+)
>>
>> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
>> index 5c310e803dd7..137ef55d6973 100644
>> --- a/drivers/perf/arm_pmu.c
>> +++ b/drivers/perf/arm_pmu.c
>> @@ -901,6 +901,9 @@ struct arm_pmu *armpmu_alloc(void)
>>  
>>  		events = per_cpu_ptr(pmu->hw_events, cpu);
>>  		events->percpu_pmu = pmu;
>> +
>> +		if (!pmu->has_smt && topology_core_has_smt(cpu))
>> +			pmu->has_smt = true;
> 
> Why isn't that just:
> 
> 	pmu->has_smt = topology_core_has_smt(cpu);
> 
> ?

also works. since one pmu only contains one type of CPU, so just thought
no need to set it multiple times.

> 
> but then if that's the case, why do we need to stash the result in the
> PMU at all?
> 

should based on the discussion here [1]. stash it during probe will avoid
calling {raw_}smp_processor_id() in pmu::event_init() which may be
horrible for debug in some condition.

[1] https://lore.kernel.org/linux-arm-kernel/aJsV7nzlILHd_ZMa@J2N7QTR9R3/

thanks.



More information about the linux-arm-kernel mailing list