[PATCH v2 08/11] sched: get CPU's activity statistic

Morten Rasmussen morten.rasmussen at arm.com
Wed Jun 4 03:00:22 PDT 2014


On Wed, Jun 04, 2014 at 10:32:10AM +0100, Vincent Guittot wrote:
> On 4 June 2014 10:08, Peter Zijlstra <peterz at infradead.org> wrote:
> > On Wed, Jun 04, 2014 at 09:47:26AM +0200, Vincent Guittot wrote:
> >> On 3 June 2014 17:50, Peter Zijlstra <peterz at infradead.org> wrote:
> >> > On Wed, May 28, 2014 at 04:47:03PM +0100, Morten Rasmussen wrote:
> >> >> Since we may do periodic load-balance every 10 ms or so, we will perform
> >> >> a number of load-balances where runnable_avg_sum will mostly be
> >> >> reflecting the state of the world before a change (new task queued or
> >> >> moved a task to a different cpu). If you had have two tasks continuously
> >> >> on one cpu and your other cpu is idle, and you move one of the tasks to
> >> >> the other cpu, runnable_avg_sum will remain unchanged, 47742, on the
> >> >> first cpu while it starts from 0 on the other one. 10 ms later it will
> >> >> have increased a bit, 32 ms later it will be 47742/2, and 345 ms later
> >> >> it reaches 47742. In the mean time the cpu doesn't appear fully utilized
> >> >> and we might decide to put more tasks on it because we don't know if
> >> >> runnable_avg_sum represents a partially utilized cpu (for example a 50%
> >> >> task) or if it will continue to rise and eventually get to 47742.
> >> >
> >> > Ah, no, since we track per task, and update the per-cpu ones when we
> >> > migrate tasks, the per-cpu values should be instantly updated.
> >> >
> >> > If we were to increase per task storage, we might as well also track
> >> > running_avg not only runnable_avg.
> >>
> >> I agree that the removed running_avg should give more useful
> >> information about the the load of a CPU.
> >>
> >> The main issue with running_avg is that it's disturbed by other tasks
> >> (as point out previously). As a typical example,  if we have 2 tasks
> >> with a load of 25% on 1 CPU, the unweighted runnable_load_avg will be
> >> in the range of [100% - 50%] depending of the parallelism of the
> >> runtime of the tasks whereas the reality is 50% and the use of
> >> running_avg will return this value
> >
> > I'm not sure I see how 100% is possible, but yes I agree that runnable
> > can indeed be inflated due to this queueing effect.
> 
> In fact, it can be even worse than that because i forgot to take into
> account the geometric series effect which implies that it depends of
> the runtime (idletime) of the task
> 
> Take 3 examples:
> 
> 2 tasks that need to run 10ms  simultaneously each 40ms. If they share
> the same CPU, they will be on the runqueue 20ms (in fact a bit less
> for one of them), Their load (runnable_avg_sum/runnable_avg_period)
> will be 33% each so the unweighted runnable_load_avg of the CPU will
> be 66%

Right, it actually depends on how often you switch between the tasks. If
you sched_tick happens every 10 ms then in this example one task will
run for 10 ms an be done, while the other one waits for 10 ms and then
runs to completion in 10 ms. The result is that one task is runnable for
10 ms and the other one is runnable for 20 ms. That gives you 25% and
50% for a total of 75%.

> 
> 2 tasks that need to run 25ms simultaneously each 100ms. If they share
> the same CPU, they will be on the runqueue 50ms (in fact a bit less
> for one of them), Their load (runnable_avg_sum/runnable_avg_period)
> will be 74% each so the unweighted runnable_load_avg of the CPU will
> be 148%
> 
> 2 tasks that need to run 50ms  simultaneously each 200ms. If they
> share the same CPU, they will be on the runqueue 100ms (in fact a bit
> less for one of them), Their load
> (runnable_avg_sum/runnable_avg_period) will be 89% each so the
> unweighted runnable_load_avg of the CPU will be 180%

In this case you switch been the tasks before they complete, so in this
case both tasks are waiting so runnable will get much righer than 75%.
You are right. It was thinking about tasks where the busy period is
short enough that they run to completion when they are scheduled.



More information about the linux-arm-kernel mailing list