[PATCH] sched: dynamic config sd_flags if described in DT

Dietmar Eggemann dietmar.eggemann at arm.com
Thu Mar 17 05:10:02 PDT 2022


On 16/03/2022 03:46, 王擎 wrote:
> 
>> (1) Can you share more information about your CPU topology?
>>
>> I guess it is a single DSU (DynamIQ Shared Unit) ARMv9 system with 8
>> CPUs? So L3 spans over [CPU0..CPU7].
>>
>> You also mentioned complexes. Am I right in assuming that [CPU0..CPU3]
>> are Cortex-A510 cores where each 2 CPUs share a complex?
>>
>> What kind of uarch are the CPUs in [CPU4..CPU7]? Are they Cortex-A510's
>> as well? I'm not sure after reading your email:
> 
> Yes, Android systems are currently used default_domain with wrong sd_flags, 
> take Qualcomm SM8450 as an example, the CPU and cache topology(1+3+4):

Ah, your system looks like this:

      .---------------.
CPU   |0 1 2 3 4 5 6 7|
      +---------------+
uarch |l l l l m m m b| (so called tri-gear: little, medium, big)
      +---------------+
  L2  |   |   | | | | |
      +---------------+
  L3  |<--         -->|
      +---------------+
      |<-- cluster -->|
      +---------------+
      |<--   DSU   -->|
      '---------------'

> |                           DSU                            |
> |           cluster0         |       cluster1     |cluster2|

^^^ Those aren't real clusters, hence the name <Phantom> SD. The cluster
is [CPU0...CPU7]. Android uses Phantom SD to subgroup CPUs with the same
uarch. That's why you get your MC->DIE SD's on your system and
SHARE_PKG_RESOURCES (ShPR) on MC, rather DIE.

Note, you should already have an asymmetric SD hierarchy. CPU7 should
only have DIE not MC! Each CPU has its own SD hierarchy!

> | core0  core1  core2  core3 |  core4 core5 core6 | core7  |
> |   complex0  |   complex1   |  ------------------------   |
> |   L2 cache  |   L2 cache   |   L2  |  L2 |  L2  |   L2   |
> |                         L3 cache                         |
> 
> The sched domain now:
> DIE[0-7]  (no SD_SHARE_PKG_RESOURCES)
> MC[0-3][4-6][7] (SD_SHARE_PKG_RESOURCES)
> 
> The sched domain should be:
> DIE[0-7]  (SD_SHARE_PKG_RESOURCES)
> MC[0-3][4-6][7] (no SD_SHARE_PKG_RESOURCES)

First remember, using Phantom SD in Android is already a hack. Normally
your system should only have an MC SD for each CPU (with ShPR).

Now, if you want to move ShPR from MC to DIE then a custom topology
table should do it, i.e. you don't have to change any generic task
scheduler code.

static inline int cpu_cpu_flags(void)
{
       return SD_SHARE_PKG_RESOURCES;
}

static struct sched_domain_topology_level custom_topology[] = {
#ifdef CONFIG_SCHED_SMT
        { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
#endif

#ifdef CONFIG_SCHED_CLUSTER
        { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
#endif

#ifdef CONFIG_SCHED_MC
        { cpu_coregroup_mask, SD_INIT_NAME(MC) },
                            ^^^^
#endif
        { cpu_cpu_mask, cpu_cpu_flags, SD_INIT_NAME(DIE) },
                        ^^^^^^^^^^^^^
        { NULL, },
};

set_sched_topology(custom_topology);

> *CLS[0-1][2-3](SD_SHARE_PKG_RESOURCES)

But why do you want to have yet another SD underneath MC for CPU0-CPU3?
sd_llc is assigned to the highest ShPR SD, which would be DIE.

>> https://lkml.kernel.org/r/SL2PR06MB30828CF9FF2879AFC9DC53D2BD0C9@SL2PR06MB3082.apcprd06.prod.outlook.com
>>
>> You might run into the issue that individual CPUs of your system see a
>> different SD hierarchy in case that [CPU4..CPU7] aren't Cortex-A510's,
>> i.e. CPUs not sharing complexes.
>>
>> (2) Related to your MC Sched Domain (SD) layer:
>>
>> If you have a single DSU ARMv9 system, then in Linux kernel mainline you
>> shouldn't have sub-clustering of [CPU0..CPU3] and [CPU4...CPU7].
>>
>> I.e. the cpu-map entry in your dts file should only list cores, not
>> clusters.
> 
> But in fact we will, as mentioned above.

OK, so your system needs this `fake` sub-grouping on uarch boundaries.
Probably because of (out-of-tree) Android/platform code? Have you tried
to run on a clean mainline SD hierarchy (only MC)? Is the Phantom SD
still required?

>> I know that in Android the cluster entries are used to sub-group
>> different uarch CPUs in an asymmetric CPU capacity system (a.k.a. Arm
>> DynamIQ and Phantom domains) but this is eclipsing the true L3 (LLC)
>> information and is not "supported" (in the sense of "used") in mainline.
>>
>> But I have a hard time to see what [CPU0..CPU3] or [CPU4..CPU7] are
>> shareing in your system.
> 
> They share L3 cache, but no share L2 which only shared within complex.

I should have written: What do they share exclusively? I can only see
that they ([CPU0..CPU3], [CPU4..CPU6], [CPU7]) share uarch exclusively.
Which relates to this fake (Phantom) SD. L3 is shared between all CPUs.

>> (3) Why do you want this different SD hierarchy?
>>
>> I assume in mainline your system will have a single SD which is MC (w/o
>> the Phantom domain approach from Android).
>>
>> You mentioned cpus_share_cache(). Or is it the extra SD level which
>> changes the behaviour of CFS load-balancing? I'm just wondering since
>> EAS wouldn't be affected here. I'm sure I can understand this better
>> once we know more about your CPU topology.
> 
> What I want to do is :
> 1.Config the right sd_llc to sd, better to get it dynamically from DT

So this should be ShPR on DIE. You have 2 choices here. Use flat MC SD
(mainline) or Phantom SD's + custom_topology.

> 2.Benefit from the shared cache(L2) of the complex
> i.e. when we look for sibling idle CPU, prior select the L2 shared CPU(inner complex) if L3 is all shared.

But cpus_share_cache() operates on sd_llc_id which relates to DIE SD
even for [CPU0..CPU3]?



More information about the linux-riscv mailing list