[RFC PATCH v3 0/2] scheduler: expose the topology of clusters and add cluster scheduler

Song Bao Hua (Barry Song) song.bao.hua at hisilicon.com
Mon Jan 25 05:50:06 EST 2021



> -----Original Message-----
> From: Dietmar Eggemann [mailto:dietmar.eggemann at arm.com]
> Sent: Wednesday, January 13, 2021 12:00 AM
> To: Morten Rasmussen <morten.rasmussen at arm.com>; Tim Chen
> <tim.c.chen at linux.intel.com>
> Cc: Song Bao Hua (Barry Song) <song.bao.hua at hisilicon.com>;
> valentin.schneider at arm.com; catalin.marinas at arm.com; will at kernel.org;
> rjw at rjwysocki.net; vincent.guittot at linaro.org; lenb at kernel.org;
> gregkh at linuxfoundation.org; Jonathan Cameron <jonathan.cameron at huawei.com>;
> mingo at redhat.com; peterz at infradead.org; juri.lelli at redhat.com;
> rostedt at goodmis.org; bsegall at google.com; mgorman at suse.de;
> mark.rutland at arm.com; sudeep.holla at arm.com; aubrey.li at linux.intel.com;
> linux-arm-kernel at lists.infradead.org; linux-kernel at vger.kernel.org;
> linux-acpi at vger.kernel.org; linuxarm at openeuler.org; xuwei (O)
> <xuwei5 at huawei.com>; Zengtao (B) <prime.zeng at hisilicon.com>; tiantao (H)
> <tiantao6 at hisilicon.com>
> Subject: Re: [RFC PATCH v3 0/2] scheduler: expose the topology of clusters and
> add cluster scheduler
> 
> On 11/01/2021 10:28, Morten Rasmussen wrote:
> > On Fri, Jan 08, 2021 at 12:22:41PM -0800, Tim Chen wrote:
> >>
> >>
> >> On 1/8/21 7:12 AM, Morten Rasmussen wrote:
> >>> On Thu, Jan 07, 2021 at 03:16:47PM -0800, Tim Chen wrote:
> >>>> On 1/6/21 12:30 AM, Barry Song wrote:
> 
> [...]
> 
> >> I think it is going to depend on the workload.  If there are dependent
> >> tasks that communicate with one another, putting them together
> >> in the same cluster will be the right thing to do to reduce communication
> >> costs.  On the other hand, if the tasks are independent, putting them together
> on the same cluster
> >> will increase resource contention and spreading them out will be better.
> >
> > Agree. That is exactly where I'm coming from. This is all about the task
> > placement policy. We generally tend to spread tasks to avoid resource
> > contention, SMT and caches, which seems to be what you are proposing to
> > extend. I think that makes sense given it can produce significant
> > benefits.
> >
> >>
> >> Any thoughts on what is the right clustering "tag" to use to clump
> >> related tasks together?
> >> Cgroup? Pid? Tasks with same mm?
> >
> > I think this is the real question. I think the closest thing we have at
> > the moment is the wakee/waker flip heuristic. This seems to be related.
> > Perhaps the wake_affine tricks can serve as starting point?
> 
> wake_wide() switches between packing (select_idle_sibling(), llc_size
> CPUs) and spreading (find_idlest_cpu(), all CPUs).
> 
> AFAICS, since none of the sched domains set SD_BALANCE_WAKE, currently
> all wakeups are (llc-)packed.

Sorry for late response. I was struggling with some other topology
issues recently.

For "all wakeups are (llc-)packed",
it seems you mean current want_affine is only affecting the new_cpu,
and for wake-up path, we will always go to select_idle_sibling() rather
than find_idlest_cpu() since nobody sets SD_WAKE_BALANCE in any
sched_domain ?

> 
>  select_task_rq_fair()
> 
>    for_each_domain(cpu, tmp)
> 
>      if (tmp->flags & sd_flag)
>        sd = tmp;
> 
> 
> In case we would like to further distinguish between llc-packing and
> even narrower (cluster or MC-L2)-packing, we would introduce a 2. level
> packing vs. spreading heuristic further down in sis().

I didn't get your point on "2 level packing". Would you like
to describe more? It seems you mean we need to have separate
calculation for avg_scan_cost and sched_feat(SIS_) for cluster
(or MC-L2) since cluster and llc are not in the same level
physically?

> 
> IMHO, Barry's current implementation doesn't do this right now. Instead
> he's trying to pack on cluster first and if not successful look further
> among the remaining llc CPUs for an idle CPU.

Yes. That is exactly what the current patch is doing.

Thanks
Barry


More information about the linux-arm-kernel mailing list