[PATCH 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
Yicong Yang
yangyicong at huawei.com
Wed Aug 13 01:03:28 PDT 2025
On 2025/8/12 18:33, Mark Rutland wrote:
> On Tue, Aug 12, 2025 at 04:08:30PM +0800, Yicong Yang wrote:
>> From: Yicong Yang <yangyicong at hisilicon.com>
>>
>> CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's
>> preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count
>> processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if
>> one of the SMT siblings is not idle on a multi-threaded implementation.
>>
>> So don't use it on SMT cores.
>
> This is rather unfortunate.
>
> When does this actually matter?
>
the event metrics use cycles will be affected, like IPC. also the cycles profiling
for code's hotspot. the result won't be precise, e.g. if the thread 0 is running at
half speed while it's sibling thread 1 is running at full speed. also sometimes
we'll use cycles to detect the currently running frequency. the senarios maybe
non-exhaustive.
> Per ARM DDI 0487 L.b, page D14-6918:
>
> | If FEAT_PMUv3p9 is implemented, then CPU_CYCLES does not increment
> | when the clocks are stopped by WFI and WFE instructions. Otherwise, it
> | is CONSTRAINED UNPREDICTABLE whether or not CPU_CYCLES continues to
> | increment when the clocks are stopped by WFI and WFE instructions.
>
> ... so prior to FEAT_PMUv3p9, no-one could rely on the difference
> anyway.
>
>> When counting cycles on SMT CPU 2-3 and CPU 3 is idle, without this
>> patch we'll get:
>> [root at client1 tmp]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
>> --taskset 2 --timeout 1
>> [...]
>> Performance counter stats for 'CPU(s) 2-3':
>>
>> CPU2 2880457316 cycles
>> CPU3 2880459810 cycles
>> 1.254688470 seconds time elapsed
>>
>> With this patch the idle state of CPU3 is observed as expected:
>> [root at client1 ~]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
>> --taskset 2 --timeout 1
>> [...]
>> Performance counter stats for 'CPU(s) 2-3':
>>
>> CPU2 2558580492 cycles
>> CPU3 305749 cycles
>> 1.113626410 seconds time elapsed
>>
>> Signed-off-by: Yicong Yang <yangyicong at hisilicon.com>
>> ---
>> drivers/perf/arm_pmuv3.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
>> index 95c899d07df5..ed3149632b71 100644
>> --- a/drivers/perf/arm_pmuv3.c
>> +++ b/drivers/perf/arm_pmuv3.c
>> @@ -1002,6 +1002,15 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
>> if (has_branch_stack(event))
>> return false;
>>
>> + /*
>> + * The PMCCNTR_EL0 increments from the processor clock rather than
>> + * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
>> + * counting on a WFI PE if one of its SMT silbing is not idle on a
>> + * multi-threaded implementation. So don't use it on SMT cores.
>> + */
>> + if (cpumask_weight(topology_sibling_cpumask(smp_processor_id())) > 1)
>> + return false;
>
> This effectively forbids use of PMCCNTR_EL0 for any events.
>
> Is there any existing event that it is useful for?
>
I think no. we're using the core pmu's events in a per-cpu (PE) manner. users don't
know whether this CPU_CYCLES events is scheduled on the PMCCNTR_EL0 or a common
counter so they also have no way to use PMCCNTR_EL0.
thanks.
More information about the linux-arm-kernel
mailing list