[PATCH V2] sched: topology: make cache topology separate from cpu topology

Sun Mar 13 19:13:06 PDT 2022

>> From: Wang Qing <wangqing at vivo.com>
>> 
>> Some architectures(e.g. ARM64), caches are implemented like below:
>> SD(Level 1):          ************ DIE ************
>> SD(Level 0):          **** MC ****    **** MC *****
>> cluster:              **cluster 0**   **cluster 1**
>> cores:                0   1   2   3   4   5   6   7
v> cache(Level 1):       C   C   C   C   C   C   C   C
>> cache(Level 2):       **C**   **C**   **C**   **C**
>> cache(Level 3):       *******shared Level 3********
>> sd_llc_id(current):   0   0   0   0   4   4   4   4
>> sd_llc_id(should be): 0   0   2   2   4   4   6   6
>
>Should cluster 0 and 1 span the same cpu mask as the MCs? Based on how
>you describe the cache above, it seems like what you are looking for
>would be:
>
>(SD DIE level removed in favor of the same span MC)
>SD(Level 1):          ************ MC  ************
>SD(Level 0):          *CLS0*  *CLS1*  *CLS2*  *CLS3* (CONFIG_SCHED_CLUSTER)
>cores:                0   1   2   3   4   5   6   7
>cache(Level 1):       C   C   C   C   C   C   C   C
>cache(Level 2):       **C**   **C**   **C**   **C**
>cache(Level 3):       *******shared Level 3********
>
>Provided cpu_coregroup_mask and cpu_clustergroup_mask return the
>corresponding cpumasks, this should work with the default sched domain
>topology.
>
>It looks to me like the lack of nested cluster support in
>parse_cluster() in drivers/base/arch_topology.c is what needs to be
>updated to accomplish the above. With cpu_topology[cpu].cluster_sibling and
>core_sibling updated to reflect the topology you describe, the rest of
>the sched domains construction would work with the default sched domain
>topology.

Complex (core[0-1]) looks like a nested cluster, but is not exactly,.
They only share L2 cache. 
parse_cluster() only parses the CPU topology, and does not parse the cache
topology even if described.

>I'm not very familiar with DT, especially the cpu-map. Does your DT
>reflect the topology you want to build?

The DT looks like:
cpu-map {
	cluster0 {
		core0 {
			cpu = <&cpu0>;
		};
		core1 {
			cpu = <&cpu1>;
		};
		core2 {
			cpu = <&cpu2>;
		};
		core3 {
			cpu = <&cpu3>;
		};
		doe_dvfs_cl0: doe {
		};
	};

	cluster1 {
		core0 {
			cpu = <&cpu4>;
		};
		core1 {
			cpu = <&cpu5>;
		};
		core2 {
			cpu = <&cpu6>;
		};
		doe_dvfs_cl1: doe {
		};
	};
};

cpus {
		cpu0: cpu at 100 {
			next-level-cache = <&L2_1>;
			L2_1: l2-cache {
 				compatible = "cache";
				next-level-cache = <&L3_1>;
 			};
			L3_1: l3-cache {
 				compatible = "cache";
 			};
		};

		cpu1: cpu at 101 {
			next-level-cache = <&L2_1>;
		};

		cpu2: cpu at 102 {
			next-level-cache = <&L2_2>;
			L2_2: l2-cache {
 				compatible = "cache";
				next-level-cache = <&L3_1>;
			};
		};

		cpu3: cpu at 103 {
			next-level-cache = <&L2_2>;
		};

		cpu4: cpu at 100 {
			next-level-cache = <&L2_3>;
			L2_3: l2-cache {
 				compatible = "cache";
				next-level-cache = <&L3_1>;
 			};
		};

		cpu5: cpu at 101 {
			next-level-cache = <&L2_3>;
		};

		cpu6: cpu at 102 {
			next-level-cache = <&L2_4>;
			L2_4: l2-cache {
 				compatible = "cache";
				next-level-cache = <&L3_1>;
 			};
		};

		cpu7: cpu at 200 {
			next-level-cache = <&L2_4>;
		};
	};

Thanks,
Wang

>
>
>-- 
>Darren Hart
>Ampere Computing / OS and Kernel