[PATCH 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores

Yicong Yang yangyicong at huawei.com
Wed Aug 13 01:17:32 PDT 2025


On 2025/8/12 18:22, Mark Rutland wrote:
> On Tue, Aug 12, 2025 at 06:14:33PM +0800, Yicong Yang wrote:
>> On 2025/8/12 18:00, James Clark wrote:
>>> On 12/08/2025 9:08 am, Yicong Yang wrote:
>>>> @@ -1002,6 +1002,15 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
>>>>       if (has_branch_stack(event))
>>>>           return false;
>>>>   +    /*
>>>> +     * The PMCCNTR_EL0 increments from the processor clock rather than
>>>> +     * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
>>>> +     * counting on a WFI PE if one of its SMT silbing is not idle on a
>>>> +     * multi-threaded implementation. So don't use it on SMT cores.
>>>> +     */
>>>> +    if (cpumask_weight(topology_sibling_cpumask(smp_processor_id())) > 1)
>>>> +        return false;
>>>> +
>>>
>>> Isn't this something that's static to the PMU? If all CPUs in each PMU are always the same then this doesn't need to be probed every time and can be set once.
>>>
>> we can make use of PMCCNTR_EL0 if the SMT is runtime disabled, e.g. by /sys/devices/system/cpu/smt/control
>> if set this at probe time then we permanently lose the chance to use PMCCNTR_EL0.
> 
> Can it be runtime enabled too?
> 

yes.

> If so, then we can't use PMCCNTR_EL0 in case we later dynamically go
> from disabled to enabled.
> 

ok, this will be a problem.

> I do not think this should be handled dynamically.
> 
>>> Also you can't call smp_processor_id() from here because this is
>>> also called in armpmu_event_init() -> __hw_perf_event_init() ->
>>> validate_group() before the event is actually scheduled on a CPU.
>>> With CONFIG_DEBUG_PREEMPT you'd see the error.
>>
>> ok, will use raw_smp_processor_id() instead. it won't affect the validation checking in pmu::event_init().
>> in pmu::add() the cpu id is always stable so it'll also be fine.
> 
> NAK to this.
> 
> It *will* affect validation since it affects the number of events that
> can be placed into a single group (by virtue of allowing or forbidding
> an additional cycles events). That would be non-deterministic, which is
> horrible to debug.
> 

ok.

if we want to do it at probed time, we have no general way for knowing the CPU
is in a multi-threaded implementation - the ACPI and OF detect this in different
ways. the topology_sibling_cpumask() only contains the online CPUs so also have the
problem mentioned above. can we simply rely on the mpidr_el1.mt (maybe not, spec
mentions it doesn't indicate the multi-threaded implementations) or any suggestion?

thanks.





More information about the linux-arm-kernel mailing list