[PATCH v8 00/10] sched: consolidation of CPU capacity and usage
vincent.guittot at linaro.org
Mon Nov 3 02:55:34 PST 2014
On 3 November 2014 03:12, Wanpeng Li <kernellwp at gmail.com> wrote:
> Hi Vincent,
> On 14/10/31 下午4:47, Vincent Guittot wrote:
>> This patchset consolidates several changes in the capacity and the usage
>> tracking of the CPU. It provides a frequency invariant metric of the usage
>> CPUs and generally improves the accuracy of load/usage tracking in the
>> scheduler. The frequency invariant metric is the foundation required for
>> consolidation of cpufreq and implementation of a fully invariant load
>> These are currently WIP and require several changes to the load balancer
>> (including how it will use and interprets load and capacity metrics) and
>> extensive validation. The frequency invariance is done with
>> arch_scale_freq_capacity and this patchset doesn't provide the backends of
>> the function which are architecture dependent.
>> As discussed at LPC14, Morten and I have consolidated our changes into a
>> patchset to make it easier to review and merge.
>> During load balance, the scheduler evaluates the number of tasks that a
>> of CPUs can handle. The current method assumes that tasks have a fix load
>> SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_CAPACITY_SCALE.
>> This assumption generates wrong decision by creating ghost cores or by
> I don't know the history, could you explain what's the meaning of 'ghost
> cores' ?
The capacity_factor gives the number of tasks that can be handled by a
group of CPUs by dividing the group's capacity by SCHED_CAPACITY_SCALE
For a system with SMT, the default capacity of a core is 1178 so the
capacity of each CPU for a dual threads per core is 589.
At CPU level we have a capacity_factor of 1 = div_round_closest(589, 1024)
At core level we still have a capacity_factor of 1 =
div_round_closest(1178, 1024). This is a intended behavior to promote
1 task per core
Then, if we have 4 cores in a node, the capacity_factor is 5 =
div_round_closest(4712, 1024) whereas we should have 4. So a 5th ghost
core has appeared in the group and the load balancer will not
considered the group as overloaded if there is 5 tasks whereas it
should in order to try to move this 5th task on an idle core (if there
Patch  solves some use cases by ensuring that we will not have more
cores than possible so we can't have more than 4 core for the previous
Now, if some RT tasks are running and using almost 1 core (1024 as an
example), the capacity_factor is still 4 = div_round_closest(3688,
1024) whereas a core is nearly fully used and the capacity_factor
should be 3
> Wanpeng Li
>> removing real ones when the original capacity of CPUs is different from
>> default SCHED_CAPACITY_SCALE. With this patch set, we don't try anymore to
>> evaluate the number of available cores based on the group_capacity but
>> we evaluate the usage of a group and compare it with its capacity.
>> This patchset mainly replaces the old capacity_factor method by a new one
>> keeps the general policy almost unchanged. These new metrics will be also
>> in later patches.
More information about the linux-arm-kernel