[RFC 3/4] sched: fix computed capacity for HMP

Vincent Guittot vincent.guittot at linaro.org
Fri Mar 28 09:22:28 EDT 2014


The current sg_capacity solves the ghost cores issue for SMT system and
cluster made of big cores which have a cpu_power above SCHED_POWER_SCALE at
core level. But it still removes some real cores of a cluster made of LITTLE
cores which have a cpu_power below SCHED_POWER_SCALE.

Instead of using the power_orig to detect SMT system and compute a smt factor
that will be used to calculate the real number of cores, we set a core_fct
field when building the sched_domain topology. We can detect SMT system thanks
to SD_SHARE_CPUPOWER flag and set core_fct to know how many CPUs per core we
have. The core_fct will ensure that sg_capacity will return cores capacity of
a SMT system and will not remove any real core of LITTLE cluster.

This method also fixes a use case where the capacity of a SMT system was
overrated.
Let take the example of a system made of 8 cores HT system:
At CPU level, sg_capacity is cap to a maximum capacity of 8 whereas
DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE) returns 9.
((589*16) / 1024) = 9.3
Now if 2 CPUs (1 core) are fully loaded by rt tasks, sg_capacity still returns
a capacity of 8 whereas it should return a capacity of 7. This happen because
DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE) is still above 7.5:
((589*14) / 1024) = 8.05

Signed-off-by: Vincent Guittot <vincent.guittot at linaro.org>
---
 kernel/sched/core.c  | 7 +++++++
 kernel/sched/fair.c  | 6 ++----
 kernel/sched/sched.h | 2 +-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f9d9776..5b20b27 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5844,6 +5844,13 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd)
 
 	WARN_ON(!sg);
 
+	if (!sd->child)
+		sg->core_fct = 1;
+	else if (sd->child->flags & SD_SHARE_CPUPOWER)
+		sg->core_fct = cpumask_weight(sched_group_cpus(sg));
+	else
+		sg->core_fct = sd->child->groups->core_fct;
+
 	do {
 		sg->group_weight = cpumask_weight(sched_group_cpus(sg));
 		sg = sg->next;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ed42061..7387c05 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5773,12 +5773,10 @@ static inline int sg_capacity(struct lb_env *env, struct sched_group *group)
 	power = group->sgp->power;
 	power_orig = group->sgp->power_orig;
 	cpus = group->group_weight;
+	smt = group->core_fct;
 
-	/* smt := ceil(cpus / power), assumes: 1 < smt_power < 2 */
-	smt = DIV_ROUND_UP(SCHED_POWER_SCALE * cpus, power_orig);
-	capacity = cpus / smt; /* cores */
+	capacity = DIV_ROUND_CLOSEST(power * cpus, power_orig * smt);
 
-	capacity = min_t(unsigned, capacity, DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE));
 	if (!capacity)
 		capacity = fix_small_capacity(env->sd, group);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c9007f2..46c3784 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -759,7 +759,7 @@ struct sched_group {
 	struct sched_group *next;	/* Must be a circular list */
 	atomic_t ref;
 
-	unsigned int group_weight;
+	unsigned int group_weight, core_fct;
 	struct sched_group_power *sgp;
 
 	/*
-- 
1.9.0




More information about the linux-arm-kernel mailing list