[RFC PATCH v4 1/3] topology: Represent clusters of CPUs within a die.

Song Bao Hua (Barry Song) song.bao.hua at hisilicon.com
Mon Mar 15 03:11:06 GMT 2021



> -----Original Message-----
> From: Song Bao Hua (Barry Song)
> Sent: Tuesday, March 2, 2021 12:00 PM
> To: tim.c.chen at linux.intel.com; catalin.marinas at arm.com; will at kernel.org;
> rjw at rjwysocki.net; vincent.guittot at linaro.org; bp at alien8.de;
> tglx at linutronix.de; mingo at redhat.com; lenb at kernel.org; peterz at infradead.org;
> dietmar.eggemann at arm.com; rostedt at goodmis.org; bsegall at google.com;
> mgorman at suse.de
> Cc: msys.mizuma at gmail.com; valentin.schneider at arm.com;
> gregkh at linuxfoundation.org; Jonathan Cameron <jonathan.cameron at huawei.com>;
> juri.lelli at redhat.com; mark.rutland at arm.com; sudeep.holla at arm.com;
> aubrey.li at linux.intel.com; linux-arm-kernel at lists.infradead.org;
> linux-kernel at vger.kernel.org; linux-acpi at vger.kernel.org; x86 at kernel.org;
> xuwei (O) <xuwei5 at huawei.com>; Zengtao (B) <prime.zeng at hisilicon.com>;
> guodong.xu at linaro.org; yangyicong <yangyicong at huawei.com>; Liguozhu (Kenneth)
> <liguozhu at hisilicon.com>; linuxarm at openeuler.org; hpa at zytor.com; Jonathan
> Cameron <jonathan.cameron at huawei.com>; Song Bao Hua (Barry Song)
> <song.bao.hua at hisilicon.com>
> Subject: [RFC PATCH v4 1/3] topology: Represent clusters of CPUs within a die.
> 
> From: Jonathan Cameron <Jonathan.Cameron at huawei.com>
> 
> Both ACPI and DT provide the ability to describe additional layers of
> topology between that of individual cores and higher level constructs
> such as the level at which the last level cache is shared.
> In ACPI this can be represented in PPTT as a Processor Hierarchy
> Node Structure [1] that is the parent of the CPU cores and in turn
> has a parent Processor Hierarchy Nodes Structure representing
> a higher level of topology.
> 
> For example Kunpeng 920 has 6 or 8 clusters in each NUMA node, and each
> cluster has 4 cpus. All clusters share L3 cache data, but each cluster
> has local L3 tag. On the other hand, each clusters will share some
> internal system bus.
> 
> +-----------------------------------+                          +---------+
> |  +------+    +------+            +---------------------------+         |
> |  | CPU0 |    | cpu1 |             |    +-----------+         |         |
> |  +------+    +------+             |    |           |         |         |
> |                                   +----+    L3     |         |         |
> |  +------+    +------+   cluster   |    |    tag    |         |         |
> |  | CPU2 |    | CPU3 |             |    |           |         |         |
> |  +------+    +------+             |    +-----------+         |         |
> |                                   |                          |         |
> +-----------------------------------+                          |         |
> +-----------------------------------+                          |         |
> |  +------+    +------+             +--------------------------+         |
> |  |      |    |      |             |    +-----------+         |         |
> |  +------+    +------+             |    |           |         |         |
> |                                   |    |    L3     |         |         |
> |  +------+    +------+             +----+    tag    |         |         |
> |  |      |    |      |             |    |           |         |         |
> |  +------+    +------+             |    +-----------+         |         |
> |                                   |                          |         |
> +-----------------------------------+                          |   L3    |
>                                                                |   data  |
> +-----------------------------------+                          |         |
> |  +------+    +------+             |    +-----------+         |         |
> |  |      |    |      |             |    |           |         |         |
> |  +------+    +------+             +----+    L3     |         |         |
> |                                   |    |    tag    |         |         |
> |  +------+    +------+             |    |           |         |         |
> |  |      |    |      |            ++    +-----------+         |         |
> |  +------+    +------+            |---------------------------+         |
> +-----------------------------------|                          |         |
> +-----------------------------------|                          |         |
> |  +------+    +------+            +---------------------------+         |
> |  |      |    |      |             |    +-----------+         |         |
> |  +------+    +------+             |    |           |         |         |
> |                                   +----+    L3     |         |         |
> |  +------+    +------+             |    |    tag    |         |         |
> |  |      |    |      |             |    |           |         |         |
> |  +------+    +------+             |    +-----------+         |         |
> |                                   |                          |         |
> +-----------------------------------+                          |         |
> +-----------------------------------+                          |         |
> |  +------+    +------+             +--------------------------+         |
> |  |      |    |      |             |   +-----------+          |         |
> |  +------+    +------+             |   |           |          |         |
> |                                   |   |    L3     |          |         |
> |  +------+    +------+             +---+    tag    |          |         |
> |  |      |    |      |             |   |           |          |         |
> |  +------+    +------+             |   +-----------+          |         |
> |                                   |                          |         |
> +-----------------------------------+                          |         |
> +-----------------------------------+                         ++         |
> |  +------+    +------+             +--------------------------+         |
> |  |      |    |      |             |  +-----------+           |         |
> |  +------+    +------+             |  |           |           |         |
> |                                   |  |    L3     |           |         |
> |  +------+    +------+             +--+    tag    |           |         |
> |  |      |    |      |             |  |           |           |         |
> |  +------+    +------+             |  +-----------+           |         |
> |                                   |                          +---------+
> +-----------------------------------+
> 
> That means the cost to transfer ownership of a cacheline between CPUs
> within a cluster is lower than between CPUs in different clusters on
> the same die. Hence, it can make sense to tell the scheduler to use
> the cache affinity of the cluster to make better decision on thread
> migration.
> 
> This patch simply exposes this information to userspace libraries
> like hwloc by providing cluster_cpus and related sysfs attributes.
> PoC of HWLOC support at [2].
> 
> Note this patch only handle the ACPI case.
> 
> Special consideration is needed for SMT processors, where it is
> necessary to move 2 levels up the hierarchy from the leaf nodes
> (thus skipping the processor core level).
> 
> Currently the ID provided is the offset of the Processor
> Hierarchy Nodes Structure within PPTT.  Whilst this is unique
> it is not terribly elegant so alternative suggestions welcome.
> 
> Note that arm64 / ACPI does not provide any means of identifying
> a die level in the topology but that may be unrelate to the cluster
> level.
> 
> [1] ACPI Specification 6.3 - section 5.2.29.1 processor hierarchy node
>     structure (Type 0)
> [2] https://github.com/hisilicon/hwloc/tree/linux-cluster
> 
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron at huawei.com>
> Signed-off-by: Barry Song <song.bao.hua at hisilicon.com>
> ---
>   -v4:
>   * used acpi_cpu_id for acpi_find_processor_node(addressing Masa's comment)
> 
>  Documentation/admin-guide/cputopology.rst | 26 +++++++++++--
>  arch/arm64/kernel/topology.c              |  2 +
>  drivers/acpi/pptt.c                       | 63 +++++++++++++++++++++++++++++++
>  drivers/base/arch_topology.c              | 14 +++++++
>  drivers/base/topology.c                   | 10 +++++
>  include/linux/acpi.h                      |  5 +++
>  include/linux/arch_topology.h             |  5 +++
>  include/linux/topology.h                  |  6 +++
>  8 files changed, 127 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/admin-guide/cputopology.rst
> b/Documentation/admin-guide/cputopology.rst
> index b90dafc..f9d3745 100644
> --- a/Documentation/admin-guide/cputopology.rst
> +++ b/Documentation/admin-guide/cputopology.rst
> @@ -24,6 +24,12 @@ core_id:
>  	identifier (rather than the kernel's).  The actual value is
>  	architecture and platform dependent.
> 
> +cluster_id:
> +
> +	the Cluster ID of cpuX.  Typically it is the hardware platform's
> +	identifier (rather than the kernel's).  The actual value is
> +	architecture and platform dependent.
> +
>  book_id:
> 
>  	the book ID of cpuX. Typically it is the hardware platform's
> @@ -56,6 +62,14 @@ package_cpus_list:
>  	human-readable list of CPUs sharing the same physical_package_id.
>  	(deprecated name: "core_siblings_list")
> 
> +cluster_cpus:
> +
> +	internal kernel map of CPUs within the same cluster.
> +
> +cluster_cpus_list:
> +
> +	human-readable list of CPUs within the same cluster.
> +
>  die_cpus:
> 
>  	internal kernel map of CPUs within the same die.
> @@ -96,11 +110,13 @@ these macros in include/asm-XXX/topology.h::
> 
>  	#define topology_physical_package_id(cpu)
>  	#define topology_die_id(cpu)
> +	#define topology_cluster_id(cpu)
>  	#define topology_core_id(cpu)
>  	#define topology_book_id(cpu)
>  	#define topology_drawer_id(cpu)
>  	#define topology_sibling_cpumask(cpu)
>  	#define topology_core_cpumask(cpu)
> +	#define topology_cluster_cpumask(cpu)
>  	#define topology_die_cpumask(cpu)
>  	#define topology_book_cpumask(cpu)
>  	#define topology_drawer_cpumask(cpu)
> @@ -116,10 +132,12 @@ not defined by include/asm-XXX/topology.h:
> 
>  1) topology_physical_package_id: -1
>  2) topology_die_id: -1
> -3) topology_core_id: 0
> -4) topology_sibling_cpumask: just the given CPU
> -5) topology_core_cpumask: just the given CPU
> -6) topology_die_cpumask: just the given CPU
> +3) topology_cluster_id: -1
> +4) topology_core_id: 0
> +5) topology_sibling_cpumask: just the given CPU
> +6) topology_core_cpumask: just the given CPU
> +7) topology_cluster_cpumask: just the given CPU
> +8) topology_die_cpumask: just the given CPU
> 
>  For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
>  default definitions for topology_book_id() and topology_book_cpumask().
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index f6faa69..fe076b3 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -103,6 +103,8 @@ int __init parse_acpi_topology(void)
>  			cpu_topology[cpu].thread_id  = -1;
>  			cpu_topology[cpu].core_id    = topology_id;
>  		}
> +		topology_id = find_acpi_cpu_topology_cluster(cpu);
> +		cpu_topology[cpu].cluster_id = topology_id;
>  		topology_id = find_acpi_cpu_topology_package(cpu);
>  		cpu_topology[cpu].package_id = topology_id;
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 4ae9335..11f8b02 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -737,6 +737,69 @@ int find_acpi_cpu_topology_package(unsigned int cpu)
>  }
> 
>  /**
> + * find_acpi_cpu_topology_cluster() - Determine a unique CPU cluster value
> + * @cpu: Kernel logical CPU number
> + *
> + * Determine a topology unique cluster ID for the given CPU/thread.
> + * This ID can then be used to group peers, which will have matching ids.
> + *
> + * The cluster, if present is the level of topology above CPUs. In a
> + * multi-thread CPU, it will be the level above the CPU, not the thread.
> + * It may not exist in single CPU systems. In simple multi-CPU systems,
> + * it may be equal to the package topology level.
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, the CPU cannot be found
> + * or there is no toplogy level above the CPU..
> + * Otherwise returns a value which represents the package for this CPU.
> + */
> +
> +int find_acpi_cpu_topology_cluster(unsigned int cpu)
> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +	struct acpi_pptt_processor *cpu_node, *cluster_node;
> +	u32 acpi_cpu_id;
> +	int retval;
> +	int is_thread;
> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		acpi_pptt_warn_missing();
> +		return -ENOENT;
> +	}
> +
> +	acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +	if (cpu_node == NULL || !cpu_node->parent) {
> +		retval = -ENOENT;
> +		goto put_table;
> +	}
> +
> +	is_thread = cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD;
> +	cluster_node = fetch_pptt_node(table, cpu_node->parent);
> +	if (cluster_node == NULL) {
> +		retval = -ENOENT;
> +		goto put_table;
> +	}
> +	if (is_thread) {
> +		if (!cluster_node->parent) {
> +			retval = -ENOENT;
> +			goto put_table;
> +		}
> +		cluster_node = fetch_pptt_node(table, cluster_node->parent);
> +		if (cluster_node == NULL) {
> +			retval = -ENOENT;
> +			goto put_table;
> +		}
> +	}
> +	retval = ACPI_PTR_DIFF(cluster_node, table);
> +put_table:
> +	acpi_put_table(table);
> +
> +	return retval;
> +}
> +
> +/**
>   * find_acpi_cpu_topology_hetero_id() - Get a core architecture tag
>   * @cpu: Kernel logical CPU number
>   *
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index de8587c..3079232 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -506,6 +506,11 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
>  	return core_mask;
>  }
> 
> +const struct cpumask *cpu_clustergroup_mask(int cpu)
> +{
> +	return &cpu_topology[cpu].cluster_sibling;
> +}
> +
>  void update_siblings_masks(unsigned int cpuid)
>  {
>  	struct cpu_topology *cpu_topo, *cpuid_topo = &cpu_topology[cpuid];
> @@ -523,6 +528,11 @@ void update_siblings_masks(unsigned int cpuid)
>  		if (cpuid_topo->package_id != cpu_topo->package_id)
>  			continue;
> 
> +		if (cpuid_topo->cluster_id == cpu_topo->cluster_id) {
> +			cpumask_set_cpu(cpu, &cpuid_topo->cluster_sibling);
> +			cpumask_set_cpu(cpuid, &cpu_topo->cluster_sibling);
> +		}
> +

I am seeing a machine without cluster is getting cluster,
so I guess we need the below:

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 3079232ed8ed..ccd4b3b5cc6f 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -528,7 +528,8 @@ void update_siblings_masks(unsigned int cpuid)
                if (cpuid_topo->package_id != cpu_topo->package_id)
                        continue;

-               if (cpuid_topo->cluster_id == cpu_topo->cluster_id) {
+               if (cpuid_topo->cluster_id == cpu_topo->cluster_id &&
+                   cpu_topo->cluster_id != -1) {
                        cpumask_set_cpu(cpu, &cpuid_topo->cluster_sibling);
                        cpumask_set_cpu(cpuid, &cpu_topo->cluster_sibling);
                }
@@ -568,6 +569,7 @@ void __init reset_cpu_topology(void)
                struct cpu_topology *cpu_topo = &cpu_topology[cpu];

                cpu_topo->thread_id = -1;
+               cpu_topo->cluster_id = -1;
                cpu_topo->core_id = -1;
                cpu_topo->package_id = -1;
                cpu_topo->llc_id = -1;

Hi Jonathan, thoughts?

Thanks
Barry




More information about the linux-arm-kernel mailing list