EEVDF regression still exists

Fri May 2 20:34:28 PDT 2025

Hello Linus,

On 5/2/2025 11:22 PM, Linus Torvalds wrote:
> On Fri, 2 May 2025 at 10:25, Prundeanu, Cristian <cpru at amazon.com> wrote:
>>
>> Another, more recent observation is that 6.15-rc4 has worse performance than
>> rc3 and earlier kernels. Maybe that can help narrow down the cause?
>> I've added the perf reports for rc3 and rc2 in the same location as before.
> 
> The only _scheduler_ change that looks relevant is commit bbce3de72be5
> ("sched/eevdf: Fix se->slice being set to U64_MAX and resulting
> crash"). Which does affect the slice calculation, although supposedly
> only under special circumstances.> 
> Of course, it could be something else.

Since it is the only !SCHED_EXT change in kernel/sched, Cristian can
perhaps try reverting it on top of v6.15-rc4 and checking if the
benchmark results jump back to v6.15-rc3 level to rule that single
change out. Very likely it could be something else.

> 
> For example, we have a AMD performance regression in general due to
> _another_ CPU leak mitigation issue, but that predates rc3 (happened
> during the merge window), so that one isn't relevant, but maybe
> something else is..
> 
> Although honestly, that slice calculation still looks just plain odd.
> It defaults the slice to zero, so if none of the 'break' conditions in
> the first loop happens, it will reset the slice to that zero value and

I believe setting slice to U64_MAX was the actual problem. Previously,
when the slice was initialized as:

       cfs_rq = group_cfs_rq(se);
       slice = cfs_rq_min_slice(cfs_rq);

If the "se" was delayed, it basically means that the group_cfs_rq() had
no tasks on it and cfs_rq_min_slice() would return "~0ULL" which will
get propagated and can lead to bad math.

> then the
> 
>          slice = cfs_rq_min_slice(cfs_rq);
> 
> ion that second loop looks like it might just pick up that zero value again.

If the first loop does not break, even for "if (cfs_rq->load.weight)",
it basically means that there are no tasks / delayed entities queued
all the way until root cfs_rq so the slices shouldn't matter.

Enqueue of the next task will correct the slices for the queued
hierarchy.

> 
> I clearly don't understand the code.
> 
>               Linus

-- 
Thanks and Regards,
Prateek