[PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
Jeremy Linton
jeremy.linton at arm.com
Mon Oct 23 14:26:33 PDT 2017
Hi,
On 10/20/2017 02:55 PM, Jeffrey Hugo wrote:
> On 10/20/2017 10:14 AM, Jeremy Linton wrote:
>> Hi,
>>
>> On 10/20/2017 04:14 AM, Lorenzo Pieralisi wrote:
>>> On Thu, Oct 19, 2017 at 11:13:27AM -0500, Jeremy Linton wrote:
>>>> On 10/19/2017 10:56 AM, Lorenzo Pieralisi wrote:
>>>>> On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
>>>>>> Propagate the topology information from the PPTT tree to the
>>>>>> cpu_topology array. We can get the thread id, core_id and
>>>>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>>>>> to those concepts. The package_id is flagged in the tree and can be
>>>>>> found by passing an arbitrary large level to
>>>>>> setup_acpi_cpu_topology()
>>>>>> which terminates its search when it finds an ACPI node flagged
>>>>>> as the physical package. If the tree doesn't contain enough
>>>>>> levels to represent all of thread/core/cod/package then the package
>>>>>> id will be used for the missing levels.
>>>>>>
>>>>>> Since server/ACPI machines are more likely to be multisocket and
>>>>>> NUMA,
>>>>>
>>>>> I think this stuff is vague enough already so to start with I would
>>>>> drop
>>>>> patch 4 and 5 and stop assuming what machines are more likely to ship
>>>>> with ACPI than DT.
>>>>>
>>>>> I am just saying, for the umpteenth time, that these levels have no
>>>>> architectural meaning _whatsoever_, level is a hierarchy concept
>>>>> with no architectural meaning attached.
>>>>
>>>> ?
>>>>
>>>> Did anyone say anything about that? No, I think the only thing being
>>>> guaranteed here is that the kernel's physical_id maps to an ACPI
>>>> defined socket. Which seems to be the mindset of pretty much the
>>>> entire !arm64 community meaning they are optimizing their software
>>>> and the kernel with that concept in mind.
>>>>
>>>> Are you denying the existence of non-uniformity between threads
>>>> running on different physical sockets?
>>>
>>> No, I have not explained my POV clearly, apologies.
>>>
>>> AFAIK, the kernel currently deals with 2 (3 - if SMT) topology layers.
>>>
>>> 1) thread
>>> 2) core
>>> 3) package
>>>
>>> What I wanted to say is, that, to simplify this series, you do not need
>>> to introduce the COD topology level, since it is just another arbitrary
>>> topology level (ie there is no way you can pinpoint which level
>>> corresponds to COD with PPTT - or DT for the sake of this discussion)
>>> that would not be used in the kernel (apart from big.LITTLE cpufreq
>>> driver and PSCI checker whose usage of topology_physical_package_id() is
>>> questionable anyway).
>>
>> Oh! But, i'm at a loss as to what to do with those two users if I set
>> the node which has the physical socket flag set, as the "cluster_id"
>> in the topology.
>>
>> Granted, this being ACPI I don't expect the cpufreq driver to be
>> active (given CPPC) and the psci checker might be ignored? Even so,
>> its a bit of a misnomer what is actually happening. Are we good with
>> this?
>>
>>
>>>
>>> PPTT allows you to define what level corresponds to a package, use
>>> it to initialize the package topology level (that on ARM internal
>>> variables we call cluster) and be done with it.
>>>
>>> I do not think that adding another topology level improves anything as
>>> far as ACPI topology detection is concerned, you are not able to use it
>>> in the scheduler or from userspace to group CPUs anyway.
>>
>> Correct, and AFAIK after having poked a bit at the scheduler its sort
>> of redundant as the generic cache sharing levels are more useful anyway.
>
> What do you mean, it can't be used? We expect a followup series which
> uses PPTT to define scheduling domains/groups.
>
> The scheduler supports 4 types of levels, with an arbitrary number of
> instances of each - NUMA, DIE (package, usually not used with NUMA), MC
> (multicore, typically cores which share resources like cache), SMT
> (threads).
It turns out to be pretty easy to map individual PPTT "levels" to MC
layers simply by creating a custom sched_domain_topology_level and
populating it with an equal number of MC layers. The only thing that
changes is the "mask" portion of each entry.
Whether that is good/bad vs just using a topology like:
static struct sched_domain_topology_level arm64_topology[] = {
#ifdef CONFIG_SCHED_SMT
{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
#endif
{ cpu_cluster_mask, cpu_core_flags, SD_INIT_NAME(CLU) },
#ifdef CONFIG_SCHED_MC
{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
#endif
{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
{ NULL, },
};
and using it on successful ACPI/PPTT parse, along with a new
cpu_cluster_mask isn't clear to me either. Particularly, if one goes in
and starts changing the "cpu_core_flags" for starters to the cpu_smt_flags.
But as mentioned I think this is a follow on patch which meshes with
patches 4/5 here.
>
> Our particular platform has a single socket/package, with multiple
> "clusters", each cluster consisting of multiple cores that share caches.
> We represent all of this in PPTT, and expect it to be used. Leaf
> nodes are cores. The level above is the cluster. The top level is the
> package. We expect eventually (and understand that Jeremy is not
> tackling this with his current series) that clusters get represented MC
> so that migrated processes prefer their cache-shared siblings, and the
> entire package is represented by DIE.
>
> This will have to come from PPTT since you can't use core_siblings to
> derive this. Additionally, if we had multiple layers of clustering, we
> would expect each layer to be represented by MC. Topology.c has none of
> this support today.
>
> PPTT can refer to SLIT/SRAT to determine if a hirearchy level
> corresponds to the "Cluster-on-Die" concept of other architectures
> (which end up as NUMA nodes in NUMA scheduling domains).
>
> What PPTT will have to do is parse the tree(s), determine what each
> level is - SMT, MC, NUMA, DIE - and then use set_sched_topology() so
> that the scheduler can build up groups/domains appropriately.
>
>
> Jeremy, we've tested v3 on our platform. The topology part works as
> expected, we no longer see lstopo reporting sockets where there are
> none, but the scheduling groups are broken (expected). Caches still
> don't work right (no sizes reported, and the sched caches are not
> attributed to the cores). We will likely have additional comments as we
> delve into it.
>>
>>>
>>> Does this answer your question ?
>> Yes, other than what to do with the two drivers.
>>
>>>
>>> Thanks,
>>> Lorenzo
>>>
>>
>
>
More information about the linux-arm-kernel
mailing list