[RFC PATCH v3 0/2] scheduler: expose the topology of clusters and add cluster scheduler
Song Bao Hua (Barry Song)
song.bao.hua at hisilicon.com
Wed Feb 3 06:32:32 EST 2021
> -----Original Message-----
> From: Tim Chen [mailto:tim.c.chen at linux.intel.com]
> Sent: Friday, January 8, 2021 12:17 PM
> To: Song Bao Hua (Barry Song) <song.bao.hua at hisilicon.com>;
> valentin.schneider at arm.com; catalin.marinas at arm.com; will at kernel.org;
> rjw at rjwysocki.net; vincent.guittot at linaro.org; lenb at kernel.org;
> gregkh at linuxfoundation.org; Jonathan Cameron <jonathan.cameron at huawei.com>;
> mingo at redhat.com; peterz at infradead.org; juri.lelli at redhat.com;
> dietmar.eggemann at arm.com; rostedt at goodmis.org; bsegall at google.com;
> mgorman at suse.de; mark.rutland at arm.com; sudeep.holla at arm.com;
> aubrey.li at linux.intel.com
> Cc: linux-arm-kernel at lists.infradead.org; linux-kernel at vger.kernel.org;
> linux-acpi at vger.kernel.org; linuxarm at openeuler.org; xuwei (O)
> <xuwei5 at huawei.com>; Zengtao (B) <prime.zeng at hisilicon.com>; tiantao (H)
> <tiantao6 at hisilicon.com>
> Subject: Re: [RFC PATCH v3 0/2] scheduler: expose the topology of clusters and
> add cluster scheduler
>
>
>
> On 1/6/21 12:30 AM, Barry Song wrote:
> > ARM64 server chip Kunpeng 920 has 6 clusters in each NUMA node, and each
> > cluster has 4 cpus. All clusters share L3 cache data while each cluster
> > has local L3 tag. On the other hand, each cluster will share some
> > internal system bus. This means cache is much more affine inside one cluster
> > than across clusters.
> >
> > +-----------------------------------+ +---------+
> > | +------+ +------+ +---------------------------+ |
> > | | CPU0 | | cpu1 | | +-----------+ | |
> > | +------+ +------+ | | | | |
> > | +----+ L3 | | |
> > | +------+ +------+ cluster | | tag | | |
> > | | CPU2 | | CPU3 | | | | | |
> > | +------+ +------+ | +-----------+ | |
> > | | | |
> > +-----------------------------------+ | |
> > +-----------------------------------+ | |
> > | +------+ +------+ +--------------------------+ |
> > | | | | | | +-----------+ | |
> > | +------+ +------+ | | | | |
> > | | | L3 | | |
> > | +------+ +------+ +----+ tag | | |
> > | | | | | | | | | |
> > | +------+ +------+ | +-----------+ | |
> > | | | |
> > +-----------------------------------+ | L3 |
> > | data |
> > +-----------------------------------+ | |
> > | +------+ +------+ | +-----------+ | |
> > | | | | | | | | | |
> > | +------+ +------+ +----+ L3 | | |
> > | | | tag | | |
> > | +------+ +------+ | | | | |
> > | | | | | ++ +-----------+ | |
> > | +------+ +------+ |---------------------------+ |
> > +-----------------------------------| | |
> > +-----------------------------------| | |
> > | +------+ +------+ +---------------------------+ |
> > | | | | | | +-----------+ | |
> > | +------+ +------+ | | | | |
> > | +----+ L3 | | |
> > | +------+ +------+ | | tag | | |
> > | | | | | | | | | |
> > | +------+ +------+ | +-----------+ | |
> > | | | |
> > +-----------------------------------+ | |
> > +-----------------------------------+ | |
> > | +------+ +------+ +--------------------------+ |
> > | | | | | | +-----------+ | |
> > | +------+ +------+ | | | | |
> >
> >
>
> There is a similar need for clustering in x86. Some x86 cores could share L2
> caches that
> is similar to the cluster in Kupeng 920 (e.g. on Jacobsville there are 6 clusters
> of 4 Atom cores, each cluster sharing a separate L2, and 24 cores sharing L3).
> Having a sched domain at the L2 cluster helps spread load among
> L2 domains. This will reduce L2 cache contention and help with
> performance for low to moderate load scenarios.
>
> The cluster detection mechanism will need
> to be based on L2 cache sharing in this case. I suggest making the
> cluster detection to be CPU architecture dependent so both ARM64 and x86 use
> cases
> can be accommodated.
>
> Attached below are two RFC patches for creating x86 L2
> cache sched domain, sans the idle cpu selection on wake up code. It is
> similar enough in concept to Barry's patch that we should have a
> single patchset that accommodates both use cases.
Hi Tim, Agreed on this.
hopefully the RFC v4 I am preparing will cover your case.
>
> Thanks.
>
> Tim
Thanks
Barry
More information about the linux-arm-kernel
mailing list