EEVDF regression still exists
K Prateek Nayak
kprateek.nayak at amd.com
Fri May 2 20:34:28 PDT 2025
Hello Linus,
On 5/2/2025 11:22 PM, Linus Torvalds wrote:
> On Fri, 2 May 2025 at 10:25, Prundeanu, Cristian <cpru at amazon.com> wrote:
>>
>> Another, more recent observation is that 6.15-rc4 has worse performance than
>> rc3 and earlier kernels. Maybe that can help narrow down the cause?
>> I've added the perf reports for rc3 and rc2 in the same location as before.
>
> The only _scheduler_ change that looks relevant is commit bbce3de72be5
> ("sched/eevdf: Fix se->slice being set to U64_MAX and resulting
> crash"). Which does affect the slice calculation, although supposedly
> only under special circumstances.>
> Of course, it could be something else.
Since it is the only !SCHED_EXT change in kernel/sched, Cristian can
perhaps try reverting it on top of v6.15-rc4 and checking if the
benchmark results jump back to v6.15-rc3 level to rule that single
change out. Very likely it could be something else.
>
> For example, we have a AMD performance regression in general due to
> _another_ CPU leak mitigation issue, but that predates rc3 (happened
> during the merge window), so that one isn't relevant, but maybe
> something else is..
>
> Although honestly, that slice calculation still looks just plain odd.
> It defaults the slice to zero, so if none of the 'break' conditions in
> the first loop happens, it will reset the slice to that zero value and
I believe setting slice to U64_MAX was the actual problem. Previously,
when the slice was initialized as:
cfs_rq = group_cfs_rq(se);
slice = cfs_rq_min_slice(cfs_rq);
If the "se" was delayed, it basically means that the group_cfs_rq() had
no tasks on it and cfs_rq_min_slice() would return "~0ULL" which will
get propagated and can lead to bad math.
> then the
>
> slice = cfs_rq_min_slice(cfs_rq);
>
> ion that second loop looks like it might just pick up that zero value again.
If the first loop does not break, even for "if (cfs_rq->load.weight)",
it basically means that there are no tasks / delayed entities queued
all the way until root cfs_rq so the slices shouldn't matter.
Enqueue of the next task will correct the slices for the queued
hierarchy.
>
> I clearly don't understand the code.
>
> Linus
--
Thanks and Regards,
Prateek
More information about the linux-arm-kernel
mailing list