[PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu

Jie Zhan zhanjie9 at hisilicon.com
Tue Aug 27 06:03:56 PDT 2024


On 26/08/2024 15:24, Beata Michalska wrote:
> ...
>> Hi Beata,
> Hi Jie,
>> I've recently tested this patchset on a Kunpeng system.
>> It works as expected when reading scaling_cur_freq.
>> The frequency samples are much stabler than what cppc_cpufreq returns.
> Thank you for giving it a spin.
> (and apologies for late reply)
>> A few minor things.
>>
>> 1. I observed larger errors on idle cpus than busy cpus, though it's just up
>> to 1%.
>> Not sure if this comes from the uncertain time interval between the last
>> tick and entering idle.
>> The shorter averaging interval, the larger error, I supposed.
> All right - will look into it.
> Just for my benefit: that diff is strictly between arch_freq_avg_get_on_cpu
> and cpufreq_driver->get(policy->cpu) ?
I can't say whether it's "strictly" between them or not because 
driver->get()
shows a fluctuating value.
On my platform, cppc_cpufreq's driver->get() sometimes shows large errors on
busy cpus (as reported by recent emails), but quite accurate on idle 
cpus (<1%).

With this patch, the "error" on idle cpus as mentioned above is 
typically <1%,
hence better in general.
>> 2. In the current implementation, the resolution of frequency would be:
>> max_freq_khz / SCHED_CAPACITY_SCALE
>> This looks a bit unnecessary to me.
>>
>> It's supposed to get a better resolution if we can do this in
>> arch_freq_get_on_cpu():
>>
>> freq = delta_cycle_cnt * max_freq_khz / delta_const_cnt
>>
>> which may require caching both current and previous sets of counts in the
>> per-cpu struct amu_cntr_sample.
>>
> arch_freq_get_on_cpu relies on the frequency scale factor to derive the average
> frequency. The scale factor is being calculated based on the deltas you have
> mentioned and arch_max_freq_scale which uses SCHED_CAPACITY_SCALE*2 factor to
> accommodate for rather low reference frequencies. arch_freq_get_on_cpu just does
> somewhat reverse computation to that.
Yeah I understood this.

arch_freq_get_on_cpu() now returns:
freq = (arch_scale_freq_capacity() * arch_scale_freq_ref()) >> 
SCHED_CAPACITY_SHIFT

The frequency resolution is (arch_scale_freq_ref() >> 
SCHED_CAPACITY_SHIFT), which
is equivalent to max_freq_khz / SCHED_CAPACITY_SCALE.

If we can directly do
freq = delta_cycle_cnt * ref_freq_khz / delta_const_cnt
in arch_freq_get_on_cpu(), it's supposed to have a 1KHz resolution.
(sorry for the wrong multiplier in the last reply)

Just similar to what's done in [1].

It could be more worthwhile to have a good resolution than utilising the
arch_topology framework?

[1] 
https://lore.kernel.org/all/20240229162520.970986-2-vanshikonda@os.amperecomputing.com/
> ---
> BR
> Beata
>> Kind regards,
>> Jie
>>





More information about the linux-arm-kernel mailing list