[PATCH] sched/rt: fix incorrect schedstats for rt thread
Dengjun Su
dengjun.su at mediatek.com
Thu Jan 8 23:24:47 PST 2026
On Thu, 2026-01-08 at 12:16 +0100, Peter Zijlstra wrote:
> On Thu, Jan 08, 2026 at 11:13:07AM +0800, Dengjun Su wrote:
> > For RT thread, only 'set_next_task_rt' will call
> > 'update_stats_wait_end_rt' to update schedstats information.
> > However, during the RT migration process,
> > 'update_stats_wait_start_rt' will be called twice, which
> > will cause the values of wait_max and wait_sum to be incorrect.
>
> Right, that looses time. Also note that I think dl has the same
> issue.
Hi Peter,
Thanks for the feedback. Yes, sorry for miss dl class,
I will update it in V2.
>
> > The specific output as follows:
> > $ cat /proc/6046/task/6046/sched | grep wait
> > wait_start : 0.000000
> > wait_max : 496717.080029
> > wait_sum : 7921540.776553
> >
> > Add 'update_stats_wait_end_rt' in 'update_stats_dequeue_rt' to
> > update schedstats information when dequeue_task.
>
> This needs a few more words on why this is correct -- notably it took
> me
> a little time to find the 'task_on_rq_migrating()' case in
> __update_stats_wait_end() which makes this not actually 'end'.
>
> But then the corresponding clause in __update_stats_wait_start()
> gives
> me a headache:
>
> 'wait_start > prev_wait_start'
>
> I mean, wtf. Should that not equally be using task_on_rq_migrating()
> ?
>
> Can you please take a hard look at all that and fix up things
> all-round?
>
A complete schedstats information update flow of migrate should be
__update_stats_wait_start() [enter queue A, stage 1] ->
__update_stats_wait_end() [leave queue A, stage 2] ->
__update_stats_wait_start() [enter queue B, stage 3] ->
__update_stats_wait_end() [start running on queue B, stage 4]
Stage 1: prev_wait_start is 0, and in the end, wait_start records the
time of entering the queue.
Stage 2: task_on_rq_migrating(p) is true, and wait_start is updated to
the waiting time on queue A.
Stage 3: prev_wait_start is the waiting time on queue A, wait_start is
the time of entering queue B, and wait_start is expected to be greater
than prev_wait_start. Under this condition, wait_start is updated to
(the moment of entering queue B) - (the waiting time on queue A).
Stage 4: the final wait time = (time when starting to run on queue B)
- (time of entering queue B) + (waiting time on queue A) = waiting
time on queue B + waiting time on queue A.
The current problem is that stage 2 does not call __update_stats_wait_end
to update wait_start, which causes the final computed wait time = waiting
time on queue B + the moment of entering queue A, leading to incorrect
wait_max and wait_sum.
For __update_stats_wait_end(), task_on_rq_migrating(p) is needed to
distinguish between stage 2 and stage 4 because they involve different
processing flows, but for __update_stats_wait_start(), it is not necessary
to distinguish between stage 1 and stage 3.
As for adding the condition wait_start > prev_wait_start, I think it is
more like a mechanism to prevent statistical deviations caused by time
inconsistencies.
Thanks
More information about the Linux-mediatek
mailing list