[RFC PATCH v2 1/2] topology: Represent clusters of CPUs within a die.

Valentin Schneider valentin.schneider at arm.com
Tue Dec 1 11:03:46 EST 2020


On 01/12/20 02:59, Barry Song wrote:
> That means the cost to transfer ownership of a cacheline between CPUs
> within a cluster is lower than between CPUs in different clusters on
> the same die. Hence, it can make sense to tell the scheduler to use
> the cache affinity of the cluster to make better decision on thread
> migration.
>
> This patch simply exposes this information to userspace libraries
> like hwloc by providing cluster_cpus and related sysfs attributes.
> PoC of HWLOC support at [2].
>
> Note this patch only handle the ACPI case.
>

AIUI this requires PPTT to describe your system like so:

 {Processor nodes}             {Caches}

       [Node0] ----------------> [L3]
          ^
          |
      [Cluster0] ---------------> []
          ^
          |
        [CPU0] ------------> [L1] -> [L2]

which is a bit odd, because there is that middling level without any
private resources. I suppose right now this is the only way to describe
this kind of cache topology via PPTT, but is that widespread?


Now, looking at the Ampere eMAG's PPTT, this has a "similar" shape. The
topology is private L1, L2 shared by pairs of CPUs, shared L3 [1].

If I parse the PPTT thing right this is encoded as:

 {Processor nodes}            {Caches}

      [Cluster0] -------------> ([L3] not present in my PPTT for some reason)
          ^
          |
      [  Pair0  ] ------------> [L2]
        ^     ^
        |     |
        |  [CPU1] ------------> [L1]
      [CPU0] -----------------> [L1] 

So you could spin the same story there were first scanning the pair and
then the cluster could help.

[1]: https://en.wikichip.org/wiki/ampere_computing/emag/8180

> Special consideration is needed for SMT processors, where it is
> necessary to move 2 levels up the hierarchy from the leaf nodes
> (thus skipping the processor core level).
>
> Currently the ID provided is the offset of the Processor
> Hierarchy Nodes Structure within PPTT.  Whilst this is unique
> it is not terribly elegant so alternative suggestions welcome.
>

Skimming through the spec, this sounds like something the ID structure
(Type 2) could be used for. However in v1 Jonathan and Sudeep talked about
UID's / DSDT, any news on that?

> Note that arm64 / ACPI does not provide any means of identifying
> a die level in the topology but that may be unrelate to the cluster
> level.
>
> [1] ACPI Specification 6.3 - section 5.2.29.1 processor hierarchy node
>     structure (Type 0)
> [2] https://github.com/hisilicon/hwloc/tree/linux-cluster
>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron at huawei.com>
> Signed-off-by: Barry Song <song.bao.hua at hisilicon.com>



More information about the linux-arm-kernel mailing list