[PATCH v2 0/3] Enable EAS for CPPC/ACPI based systems

Thu Apr 7 03:00:43 PDT 2022

Hello,

The following branch contains all the required patches:
https://gitlab.arm.com/linux-arm/linux-pg/-/tree/pg/eas_acpi_v2

Regards,
Pierre

On 4/7/22 11:36, Itaru Kitayama wrote:
> Do you happen to have your own dev git tree that  has this series?
> 
> Itaru.
> 
> On Thu, Apr 7, 2022 at 17:54 Pierre Gondois <pierre.gondois at arm.com <mailto:pierre.gondois at arm.com>> wrote:
> 
>     From: Pierre Gondois <Pierre.Gondois at arm.com <mailto:Pierre.Gondois at arm.com>>
> 
>     v2:
>     - Remove inline hint of cppc_cpufreq_search_cpu_data(). [Mark]
>     - Use EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL(). [Mark]
>     - Use a bitmap to squeeze CPU efficiency class values. [Mark]
> 
>     0. Overview
> 
>     The current Energy Model (EM) for CPUs requires knowledge about CPU
>     performance states and their power consumption. Both of these
>     information is not available for ACPI based systems.
> 
>     In ACPI, describing power efficiency of CPUs can be done through the
>     following arm specific field:
> 
>     ACPI 6.4, s5.2.12.14 "GIC CPU Interface (GICC) Structure",
>     "Processor Power Efficiency Class field":
>     Describes the relative power efficiency of the associated pro-
>     cessor. Lower efficiency class numbers are more efficient than
>     higher ones (e.g. efficiency class 0 should be treated as more
>     efficient than efficiency class 1). However, absolute values
>     of this number have no meaning: 2 isn't necessarily half as
>     efficient as 1.
> 
>     Add an 'efficiency_class' field to describe the relative power
>     efficiency of CPUs. CPUs relying on this field will have performance
>     states (power and frequency values) artificially created. Such EM will
>     be referred to as an artificial EM.
> 
>     The artificial EM is used for the CPPC driver.
> 
>     1. Dependencies
> 
>     This patch-set has a dependency on:
>       - [0/8] Introduce support for artificial Energy Model
>     https://lkml.org/lkml/2022/3/16/850 <https://lkml.org/lkml/2022/3/16/850>
>     introduces a new callback in the Energy Model (EM) and prevents the
>     registration of devices using power values from an EM when the EM
>     is artificial. Not having this patch-set would break builds.
>       - This patch-set based on linux-next.
> 
>     2. Testing
> 
>     This patch-set has been tested on a Juno-r2 and a Pixel4. Two types
>     of tests were done: energy testing, and performance testing.
> 
>     The energy testing was done with 2 sets of tasks:
>     - homogeneous tasks (#Tasks at 5% utilization and 16ms period)
>     - heterogeneous tasks (#Tasks at 5|10|15% utilization and 16ms period).
>        If a test has 3 tasks, then there is one with each utilization
>        (1 at 5%, 1 at 10%, 1 at 15%).
>     Tasks spawn on the biggest CPU(s) of the platform. If there are
>     multiple big CPUs, tasks spawn alternatively on big CPUs.
> 
>     2.1. Juno-r2 testing
> 
>     The Juno-r2 has 6 CPUs:
>     - 4 little [0, 3-5], max_capa=383
>     - 2 big [1-2], max_capa=1024
>     Base kernel is v5.17-rc5.
> 
>     2.1.1. Energy testing
> 
>     The tests were done on:
>     - a system using a DT and the scmi cpufreq driver. Comparison
>        is done between no-EAS and EAS.
>     - a system using ACPI and the cppc cpufreq driver. Comparison
>        is done between CPPC-no-EAS and CPPC-EAS. CPPC-EAS uses
>        the artificial EM.
> 
>     Energy numbers come from the Juno energy counter, by summing
>     little and big clusters energy spending. There has been 5 iterations
>     of each test. Lower energy spending is better.
> 
>     2.1.1.1. Homogeneous tasks
> 
>     Energy results (Joules):
>     +--------+-------------------+-----------------------------+
>     |        |            no-EAS |                         EAS |
>     +--------+---------+---------+-------------------+---------+
>     | #Tasks |    Mean | ci(+/-) |              Mean | ci(+/-) |
>     +--------+---------+---------+-------------------+---------+
>     |    10  |   7.89  |    0.26 |     6.99 (-11.36) |    0.49 |
>     |    20  |  13.42  |    0.32 |    13.42 ( -0.02) |    0.08 |
>     |    30  |  21.43  |    0.98 |    21.62 ( +0.87) |    0.63 |
>     |    40  |  30.03  |    0.82 |    30.31 ( +0.94) |    0.37 |
>     |    50  |  43.19  |    0.56 |    43.50 ( +0.72) |    0.52 |
>     +--------+---------+---------+-------------------+---------+
>     +--------+-------------------+-----------------------------+
>     |        |       CPPC-no-EAS |                    CPPC-EAS |
>     +--------+---------+---------+-------------------+---------+
>     | #Tasks |    Mean | ci(+/-) |              Mean | ci(+/-) |
>     +--------+---------+---------+-------------------+---------+
>     |    10  |    7.86 |    0.37 |     5.64 (-28.23) |    0.05 |
>     |    20  |   13.36 |    0.20 |    10.92 (-18.31) |    0.31 |
>     |    30  |   19.28 |    0.34 |    18.30 ( -5.07) |    0.64 |
>     |    40  |   28.33 |    0.59 |    27.13 ( -4.23) |    0.42 |
>     |    50  |   40.78 |    0.58 |    40.77 ( -0.04) |    0.45 |
>     +--------+---------+---------+-------------------+---------+
> 
>     Missed activations were measured while comparing CPPC-no-EAS/CPPC-EAS
>     energy values. They were of 0.00% for all tests and both
>     configurations. Missed activations start to appear in a significant
>     number starting from ~70 tasks.
> 
>     2.1.1.2. Heterogeneous tasks
> 
>     Energy results (Joules):
>     +--------+-------------------+-----------------------------+
>     |        |            no-EAS |                         EAS |
>     +--------+---------+---------+-------------------+---------+
>     | #Tasks |    Mean | ci(+/-) |              Mean | ci(+/-) |
>     +--------+---------+---------+-------------------+---------+
>     |     3  |    5.25 |    0.50 |    4.58 (-12.82%) |    0.07 |
>     |     9  |   12.30 |    0.28 |   11.45 ( -6.97%) |    0.34 |
>     |    15  |   20.06 |    1.32 |   20.60 (  2.66%) |    1.00 |
>     |    21  |   30.03 |    0.63 |   30.07 (  0.12%) |    0.41 |
>     +--------+---------+---------+-------------------+---------+
>     +--------+-------------------+-----------------------------+
>     |        |       CPPC-no-EAS |                    CPPC-EAS |
>     +--------+---------+---------+-------------------+---------+
>     | #Tasks |    Mean | ci(+/-) |              Mean | ci(+/-) |
>     +--------+---------+---------+-------------------+---------+
>     |     3  |    4.58 |    0.31 |    3.65 (-20.31%) |    0.05 |
>     |     9  |   11.53 |    0.20 |    9.23 (-19.97%) |    0.22 |
>     |    15  |   19.19 |    0.16 |   18.33 ( -4.49%) |    0.71 |
>     |    21  |   29.07 |    0.29 |   29.06 ( -0.01%) |    0.08 |
>     +--------+---------+---------+-------------------+---------+
> 
>     Missed activations were measured while comparing CPPC-no-EAS/CPPC-EAS
>     energy values. They were of 0.00% for all tests and both
>     configurations. Missed activations start to appear in a significant
>     number starting from ~36 tasks.
> 
>     2.1.1.3. Analysis:
> 
>     The artificial EM often shows better energy gains than the EM,
>     especially for small loads. Indeed, the artificial power values
>     show a huge energy gain by placing tasks on little CPUs. The 6%
>     margin is always reached, so tasks are easily placed on little
>     CPUs. The margin is not always reached with real power values,
>     leading to tasks staying on big CPUs.
> 
>     2.1.2. Performance testing
> 
>     10 iterations of HackBench with the "--pipe --thread" options and
>     1000 loops. Compared value is the testing time in seconds. A lower
>     timing is better.
>     +----------------+-------------------+---------------------------+
>     |                |       CPPC-no-EAS |                  CPPC-EAS |
>     +--------+-------+---------+---------+-----------------+---------+
>     | Groups | Tasks |    Mean | ci(+/-) |           Mean  | ci(+/-) |
>     +--------+-------+---------+---------+-----------------+---------+
>     |      1 |    40 |    2.39 |    0.19 |   2.39 (-0.24%) |    0.07 |
>     |      2 |    80 |    5.56 |    0.48 |   5.28 (-5.02%) |    0.42 |
>     |      4 |   160 |   12.15 |    0.84 |  12.06 (-0.80%) |    0.48 |
>     |      8 |   320 |   23.03 |    0.94 |  23.12 (+0.36%) |    0.70 |
>     +--------+-------+---------+---------+-----------------+---------+
> 
>     The performance is overall sligthly better, but stays in the margin
>     or error.
> 
> 
>     2.2. Pixel4 testing
> 
>     Pixel4 has 7 CPUs:
>     - 4 little [0-3], max_capa=261
>     - 3 medium [4-6], max_capa=861
>     - 1 big [7], max_capa=1024
> 
>     Base kernel is android-10.0.0_r0.81. The performance states advertised
>     in the DT were modified with performance states that would be generated
>     by this patch-set.
>     The artificial EM was set such as little CPUs > medium CPUs > big CPU,
>     meaning little CPUs are the most energy efficient.
>     Comparing the power/capacity ratio, little CPUs' performance states are
>     all more energy efficient than the medium CPUs' performance states.
>     This is wrong when comparing medium and big CPUs.
> 
>     2.2.1. Energy testing
> 
>     The 2 sets of tests (heterogeneous/homogeneous) were tested while
>     registering battery voltage and current (power is obtained by
>     multiplying them).
>     Voltage is averaged over a rolling period of ~11s and current over a
>     period of ~6s. Usb-C cable is plugged in but alimentation is cut.
>     Pixel4 is on airplane mode. The tests lasts 120s, the first 50s and
>     last 10s are trimmed as the power is slowly raising to reach a
>     plateau.
>     Are compared:
>     - android with EAS (but NO_FIND_BEST_TARGET is set):
>        echo ENERGY_AWARE > /sys/kernel/debug/sched_features
>        echo NO_FIND_BEST_TARGET > /sys/kernel/debug/sched_features
>     - android without EAS:
>        echo NO_ENERGY_AWARE > /sys/kernel/debug/sched_features
>     - android with the artificial energy model
>     Lower energy spending is better.
> 
>     2.2.1.2. Homogeneous tasks
> 
>     Energy results (in uW):
>     +--------+-------------------+-----------------------------+
>     |        |       Without EAS |                    With EAS |
>     +--------+---------+---------+-------------------+---------+
>     | #Tasks |    Mean | ci(+/-) |              Mean | ci(+/-) |
>     +--------+---------+---------+-------------------+---------+
>     |    10  | 6.21+05 | 3.12+02 | 5.09+05 (-18.01%) | 2.18+03 |
>     |    20  | 9.12+05 | 9.71+02 | 7.91+05 (-13.26%) | 9.92+02 |
>     |    30  | 1.25+06 | 2.02+03 | 1.09+06 (-12.12%) | 2.00+03 |
>     |    40  | 2.05+06 | 5.15+03 | 1.38+06 (-32.36%) | 1.21+03 |
>     |    50  | 3.03+06 | 6.94+03 | 1.89+06 (-37.44%) | 3.21+03 |
>     +--------+---------+---------+-------------------+---------+
>     +--------+-------------------+-----------------------------+
>     |        |       Without EAS |                  With patch |
>     +--------+---------+---------+-------------------+---------+
>     | #Tasks |    Mean | ci(+/-) |              Mean | ci(+/-) |
>     +--------+---------+---------+-------------------+---------+
>     |    10  | 6.21+05 | 3.12+02 | 4.39+05 (-29.29%) | 5.63+02 |
>     |    20  | 9.12+05 | 9.71+02 | 7.30+05 (-19.90%) | 1.98+03 |
>     |    30  | 1.25+06 | 2.02+03 | 1.01+06 (-18.60%) | 1.72+03 |
>     |    40  | 2.05+06 | 5.15+03 | 1.38+06 (-32.60%) | 3.93+03 |
>     |    50  | 3.03+06 | 6.94+03 | 2.05+06 (-32.08%) | 1.25+04 |
>     +--------+---------+---------+-------------------+---------+
> 
>     2.2.1.2. Heterogeneous tasks
> 
>     Energy results (in uW):
>     +--------+-------------------+-----------------------------+
>     |        |       Without EAS |                    With EAS |
>     +--------+---------+---------+-------------------+---------+
>     | #Tasks |    Mean | ci(+/-) |              Mean | ci(+/-) |
>     +--------+---------+---------+-------------------+---------+
>     |     3  | 5.14+05 | 1.06+03 | 3.76+05 (-26.82%) | 4.58+02 |
>     |     9  | 8.52+05 | 1.18+03 | 7.25+05 (-14.96%) | 1.39+03 |
>     |    15  | 1.42+06 | 3.14+03 | 1.20+06 (-15.41%) | 1.06+04 |
>     |    21  | 2.73+06 | 3.49+03 | 1.49+06 (-45.47%) | 3.43+03 |
>     |    27  | 3.17+06 | 6.92+03 | 2.42+06 (-23.77%) | 8.43+03 |
>     +--------+---------+---------+-------------------+---------+
>     +--------+-------------------+-----------------------------+
>     |        |       Without EAS |                  With patch |
>     +--------+---------+---------+-------------------+---------+
>     | #Tasks |    Mean | ci(+/-) |              Mean | ci(+/-) |
>     +--------+---------+---------+-------------------+---------+
>     |     3  | 5.14+05 | 1.06+03 | 3.82+05 (-25.70%) | 7.67+02 |
>     |     9  | 8.52+05 | 1.18+03 | 7.05+05 (-17.30%) | 9.79+02 |
>     |    15  | 1.42+06 | 3.14+03 | 1.05+06 (-26.00%) | 1.15+03 |
>     |    21  | 2.73+06 | 3.49+03 | 1.53+06 (-43.68%) | 2.23+03 |
>     |    27  | 3.17+06 | 6.92+03 | 2.86+06 ( -9.77%) | 4.26+03 |
>     +--------+---------+---------+-------------------+---------+
> 
>     2.2.1.2. Analysis
> 
>     Similarly to Juno, the artificial performance states show a huge
>     gain to place tasks on small CPUs, leading to better energy results.
> 
>     2.2.2. Performance testing
> 
>     10 iterations of PcMark. Compared value is the final score
>     (PcmaWorkv3Score). A bigger score is better.
>     +----------------+-------------------------+-------------------------+
>     |    Without EAS |                With EAS |              With patch |
>     +------+---------+---------------+---------+---------------+---------+
>     | Mean | ci(+/-) |          Mean | ci(+/-) |          Mean | ci(+/-) |
>     +------+---------+---------------+---------+---------------+---------+
>     | 8026 |      86 |          8003 |      74 | 7840 (-2.00%) |     104 |
>     +------+---------+---------------+---------+---------------+---------+
> 
>     Performance is lower, but still in the margin of error.
> 
> 
>     3. Summary
> 
>     The artificial performance states show overall better energy results
>     and a small performance decrease. They lead to a more aggressive task
>     placement on the most energy efficient CPUs, and this explains the
>     results.
> 
>     Pierre Gondois (3):
>        cpufreq: CPPC: Add cppc_cpufreq_search_cpu_data
>        cpufreq: CPPC: Add per_cpu efficiency_class
>        cpufreq: CPPC: Register EM based on efficiency class information
> 
>       arch/arm64/kernel/smp.c        |   1 +
>       drivers/cpufreq/cppc_cpufreq.c | 201 +++++++++++++++++++++++++++++++++
>       2 files changed, 202 insertions(+)
> 
>     -- 
>     2.25.1
> 
> 
>     _______________________________________________
>     linux-arm-kernel mailing list
>     linux-arm-kernel at lists.infradead.org <mailto:linux-arm-kernel at lists.infradead.org>
>     http://lists.infradead.org/mailman/listinfo/linux-arm-kernel <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>
>