[RFC PATCH 1/1] sched/deadline: Fix RT task potential starvation when expiry time passed

Juri Lelli juri.lelli at redhat.com
Fri Jun 20 08:22:17 PDT 2025


On 20/06/25 11:00, Kuyo Chang wrote:

...

> "DL replenish lagged too much" means the fair_server took much longer
> than expected to use up its running time,
> so the deadline fell way behind the clock (which is also why
> start_dl_timer() failed). 
> In this situation, just replenishing one dl_period isn’t enough to
> catch up.
>  
> A corner case is when there are too many IRQs or IPIs in the system.
> In this case, runtime gets consumed very slowly, and the fair_server
> keep running without being throttled.
> Even the runtime is exhausted finally, the fair_server would be
> restarted immediately.
> In the end, IRQs, IPIs, and fair tasks can take over the whole system,
> no chance for RT tasks to run.

Thanks for the additional explanation.

The way I understand it now is the following (of course please correct
me if I am still not getting it :)

- a dl_server is actively servicing NORMAL tasks, but suffers lot of IRQ
  load and cannot make much progress
- it does anyway make progress, but it reaches update_curr_dl_se at throttle
  only when its current deadline is past rq_clock
- dl_runtime_exceeded() branch is entered, but start_dl_timer() fails as
  the computed act is still in the past
- enqueue_dl_entity(REPLENISH) call replenish_dl_entity() which tries to
  add runtime and advance the deadline, but time moved on so far that
  deadline is still behind rq_clock() and so "DL replenish ..." is
  printed
- replenish_dl_new_period() updates runtime and deadline from current
  clock and the dl-server is put back to run (so it continues to run
  over/starve FIFO tasks)

It looks like your proposed fix might work in this particular corner
case, but I am not 100% comfortable with not trying to replenish
properly (catch up with runtime) at all. I wonder if we might then start
missing some other corner case. Maybe we could try to catch this
particular corner case before even attempting to start the dl_timer,
since we know it will fail, and do something at that point?

Thanks,
Juri




More information about the linux-arm-kernel mailing list