[RFC PATCH 1/1] sched/deadline: Fix RT task potential starvation when expiry time passed
Kuyo Chang
kuyo.chang at mediatek.com
Thu Jun 19 20:00:53 PDT 2025
On Thu, 2025-06-19 at 15:13 +0200, Juri Lelli wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>
>
> On 18/06/25 22:20, Kuyo Chang wrote:
>
> ...
>
> > When dl_defer_running = 1 and the running time has been exhausted,
> > it means that the dl_server should stop at this point.
> > However, if start_dl_timer() returns a failure, it indicates that
> > the
> > actual time spent consuming the running time was unexpectedly long.
> >
> > At this point, there are two options:
> > [as-is] 1. re-enqueuing the dl entity with ENQUEUE_REPLENISH will
> > clear
> > the throttled flag
> > and re-enqueue the dl entity to keep the fair_server running.
> > enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH);
> > => replenish_dl_entity
> > => replenish_dl_new_period(dl_se, rq);
> > => dl_se->dl_yielded = 0;
> > => dl_se->dl_throttled = 0;
> > => __enqueue_dl_entity(dl_se);
> >
> > [to-be] 2. To avoid RT latency, the fair_server should remain
> > throttled
> > while replenishing the dl_se.
> > Once replenishing is complete, we can ensure that a timer is
> > successfully started.
> > When the timer is triggered, the throttled state will be cleared,
> > ensuring that RT tasks can execute during this interval.
> >
> > It is a policy decision for dealing with the case of failure in
> > start_dl_timer().
> > The second approach is better for real-time (RT) latency in my
> > opinion,
> > as RT tasks must be prioritized.
>
> OK, I think I see your points, but I am still not sure I fully
> understand the link with the issue you describe in the changelog -
> the
> relation with "DL replenish lagged too much", that is.
>
> Could you please expand on the details of the situation that is
> opening
> up for the issue your patch is addressing? Do you know why we hit the
> corner case that causes the warning in the first place?
>
"DL replenish lagged too much" means the fair_server took much longer
than expected to use up its running time,
so the deadline fell way behind the clock (which is also why
start_dl_timer() failed).
In this situation, just replenishing one dl_period isn’t enough to
catch up.
A corner case is when there are too many IRQs or IPIs in the system.
In this case, runtime gets consumed very slowly, and the fair_server
keep running without being throttled.
Even the runtime is exhausted finally, the fair_server would be
restarted immediately.
In the end, IRQs, IPIs, and fair tasks can take over the whole system,
no chance for RT tasks to run.
> I would like to understand exactly what we are trying to fix before
> deciding how to fix it, sorry if I am being dense. :-)
>
> Thanks,
> Juri
>
More information about the linux-arm-kernel
mailing list