[Linuxarm] Re: [RFC PATCH v4 3/3] scheduler: Add cluster scheduler level for x86

Song Bao Hua (Barry Song) song.bao.hua at hisilicon.com
Mon Mar 8 22:30:33 GMT 2021



> -----Original Message-----
> From: Tim Chen [mailto:tim.c.chen at linux.intel.com]
> Sent: Thursday, March 4, 2021 7:34 AM
> To: Peter Zijlstra <peterz at infradead.org>; Song Bao Hua (Barry Song)
> <song.bao.hua at hisilicon.com>
> Cc: catalin.marinas at arm.com; will at kernel.org; rjw at rjwysocki.net;
> vincent.guittot at linaro.org; bp at alien8.de; tglx at linutronix.de;
> mingo at redhat.com; lenb at kernel.org; dietmar.eggemann at arm.com;
> rostedt at goodmis.org; bsegall at google.com; mgorman at suse.de;
> msys.mizuma at gmail.com; valentin.schneider at arm.com;
> gregkh at linuxfoundation.org; Jonathan Cameron <jonathan.cameron at huawei.com>;
> juri.lelli at redhat.com; mark.rutland at arm.com; sudeep.holla at arm.com;
> aubrey.li at linux.intel.com; linux-arm-kernel at lists.infradead.org;
> linux-kernel at vger.kernel.org; linux-acpi at vger.kernel.org; x86 at kernel.org;
> xuwei (O) <xuwei5 at huawei.com>; Zengtao (B) <prime.zeng at hisilicon.com>;
> guodong.xu at linaro.org; yangyicong <yangyicong at huawei.com>; Liguozhu (Kenneth)
> <liguozhu at hisilicon.com>; linuxarm at openeuler.org; hpa at zytor.com
> Subject: [Linuxarm] Re: [RFC PATCH v4 3/3] scheduler: Add cluster scheduler
> level for x86
> 
> 
> 
> On 3/2/21 2:30 AM, Peter Zijlstra wrote:
> > On Tue, Mar 02, 2021 at 11:59:40AM +1300, Barry Song wrote:
> >> From: Tim Chen <tim.c.chen at linux.intel.com>
> >>
> >> There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce
> >> is shared among a cluster of cores instead of being exclusive
> >> to one single core.
> >
> > Isn't that most atoms one way or another? Tremont seems to have it per 4
> > cores, but earlier it was per 2 cores.
> >
> 
> Yes, older Atoms have 2 cores sharing L2.  I probably should
> rephrase my comments to not leave the impression that sharing
> L2 among cores is new for Atoms.
> 
> Tremont based Atom CPUs increases the possible load imbalance more
> with 4 cores per L2 instead of 2.  And also with more overall cores on a die,
> the
> chance increases for packing running tasks on a few clusters while leaving
> others empty on light/medium loaded systems.  We did see
> this effect on Jacobsville.
> 
> So load balancing between the L2 clusters is more
> useful on Tremont based Atom CPUs compared to the older Atoms.

It seems sensible the more CPU we get in the cluster, the more
we need the kernel to be aware of its existence.

Tim, it is possible for you to bring up the cpu_cluster_mask and
cluster_sibling for x86 so that the topology can be represented
in sysfs and be used by scheduler? It seems your patch lacks this
part.

BTW, I wonder if x86 can do some improvement on your KMP_AFFINITY
by leveraging the cluster topology level.
https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/optimization-and-programming-guide/openmp-support/openmp-library-support/thread-affinity-interface-linux-and-windows.html

KMP_AFFINITY has thread affinity modes like compact and scatter,
it seems this "compact" and "scatter" can also use the cluster
information as you see we are also struggling with the "compact"
and "scatter" issues here in this patchset :-)

Thanks
Barry


More information about the linux-arm-kernel mailing list