[PATCH] arm64: topology: Stop using MPIDR for topology information
prime.zeng at hisilicon.com
Wed Sep 2 21:44:24 EDT 2020
> -----Original Message-----
> From: Valentin Schneider [mailto:valentin.schneider at arm.com]
> Sent: Wednesday, September 02, 2020 6:52 PM
> To: Zengtao (B)
> Cc: linux-kernel at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
> Catalin Marinas; Will Deacon; Sudeep Holla; Robin Murphy; Jeremy Linton;
> Dietmar Eggemann; Morten Rasmussen
> Subject: Re: [PATCH] arm64: topology: Stop using MPIDR for topology
> On 02/09/20 04:24, B wrote:
> > Hi Valentin:
> >> -----Original Message-----
> >> From: Valentin Schneider [mailto:valentin.schneider at arm.com]
> >> Sent: Saturday, August 29, 2020 9:00 PM
> >> To: linux-kernel at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> >> Cc: Catalin Marinas; Will Deacon; Sudeep Holla; Robin Murphy; Jeremy
> >> Linton; Dietmar Eggemann; Morten Rasmussen; Zengtao (B)
> >> Subject: [PATCH] arm64: topology: Stop using MPIDR for topology
> >> information
> >> In the absence of ACPI or DT topology data, we fallback to haphazardly
> >> decoding *something* out of MPIDR. Sadly, the contents of that
> >> are
> >> mostly unusable due to the implementation leniancy and things like Aff0
> >> having to be capped to 15 (despite being encoded on 8 bits).
> >> Consider a simple system with a single package of 32 cores, all under
> >> same LLC. We ought to be shoving them in the same core_sibling mask,
> >> but
> >> MPIDR is going to look like:
> >> | CPU | 0 | ... | 15 | 16 | ... | 31 |
> >> |------+---+-----+----+----+-----+----+
> >> | Aff0 | 0 | ... | 15 | 0 | ... | 15 |
> >> | Aff1 | 0 | ... | 0 | 1 | ... | 1 |
> >> | Aff2 | 0 | ... | 0 | 0 | ... | 0 |
> >> Which will eventually yield
> >> core_sibling(0-15) == 0-15
> >> core_sibling(16-31) == 16-31
> >> NUMA woes
> >> =========
> >> If we try to play games with this and set up NUMA boundaries within
> >> groups of 16 cores via e.g. QEMU:
> >> # Node0: 0-9; Node1: 10-19
> >> $ qemu-system-aarch64 <blah> \
> >> -smp 20 -numa node,cpus=0-9,nodeid=0 -numa
> >> node,cpus=10-19,nodeid=1
> >> The scheduler's MC domain (all CPUs with same LLC) is going to be built
> >> arch_topology.c::cpu_coregroup_mask()
> >> In there we try to figure out a sensible mask out of the topology
> >> information we have. In short, here we'll pick the smallest of NUMA or
> >> core sibling mask.
> >> node_mask(CPU9) == 0-9
> >> core_sibling(CPU9) == 0-15
> >> MC mask for CPU9 will thus be 0-9, not a problem.
> >> node_mask(CPU10) == 10-19
> >> core_sibling(CPU10) == 0-15
> >> MC mask for CPU10 will thus be 10-19, not a problem.
> >> node_mask(CPU16) == 10-19
> >> core_sibling(CPU16) == 16-19
> >> MC mask for CPU16 will thus be 16-19... Uh oh. CPUs 16-19 are in two
> >> different unique MC spans, and the scheduler has no idea what to make
> >> that. That triggers the WARN_ON() added by commit
> >> ccf74128d66c ("sched/topology: Assert non-NUMA topology masks
> >> don't (partially) overlap")
> >> Fixing MPIDR-derived topology
> >> =============================
> >> We could try to come up with some cleverer scheme to figure out which
> >> the available masks to pick, but really if one of those masks resulted
> >> MPIDR then it should be discarded because it's bound to be bogus.
> >> I was hoping to give MPIDR a chance for SMT, to figure out which
> >> are
> >> in the same core using Aff1-3 as core ID, but Sudeep and Robin pointed
> >> to me that there are systems out there where *all* cores have
> >> values in their higher affinity fields (e.g. RK3288 has "5" in all of its
> >> cores' MPIDR.Aff1), which would expose a bogus core ID to userspace.
> >> Stop using MPIDR for topology information. When no other source of
> >> topology
> >> information is available, mark each CPU as its own core and its NUMA
> >> node
> >> as its LLC domain.
> > I agree with your idea to remove the topology functionality of MPIDR ,
> > but I think we need also consider ARM32 and GIC.
> Could you please elaborate? This change doesn't impact arch_topology, so
> only arm64 is affected.
Yes, this change only affects arm64, my question is that do we need to
leverage it to arm32 since arm32 got the same issue.
And for GIC we are also using MPIDR for the topology info, but I am sure
It's got the same issue or not, just a suggestion to have a look.
More information about the linux-arm-kernel