[PATCH v10 0/6] ACPI: Support Generic Initiator proximity domains

Jonathan Cameron Jonathan.Cameron at Huawei.com
Fri Sep 18 08:17:01 EDT 2020


On Mon, 7 Sep 2020 22:03:01 +0800
Jonathan Cameron <Jonathan.Cameron at huawei.com> wrote:

> It would be very nice to finally merge this support during this cycle,
> so please take a look.

Hi All,

Just a quick reminder that this set is still looking for review.

Thanks,

Jonathan

> 
> I think we need acks covering x86, ARM and ACPI.  Rafael took a look back
> in November at v5 and was looking for x86 and ARM acks.  Whilst there is
> no ARM specific code left we probably still need an Ack.  If anyone is
> missing from the cc list, please add them.
> 
> Introduces a new type of NUMA node for cases where we want to represent
> the access characteristics of a non CPU initiator of memory requests,
> as these differ from all those for existing nodes containing CPUs and/or
> memory.
> 
> These Generic Initiators are presented by the node access0 class in
> sysfs in the same way as a CPU.   It seems likely that there will be
> usecases in which the best 'CPU' is desired and Generic Initiators
> should be ignored.  The final few patches in this series introduced
> access1 which is a new performance class in the sysfs node description
> which presents only CPU to memory relationships.  Test cases for this
> are described below.
> 
> Changes since v9:
> Thanks to Bjorn Helgaas for review.
> * Fix ordering of checks in patch 4 so we check the version number first.
> 
> Changes since v8:
> * ifdef protections and stubs to avoid a build error on ia64. I'm assuming
>   no one cares about Generic Initiators on IA64 (0-day)
> * Update OSC code to ensure we don't claim to support GIs except on x86 and
>   ARM64
> 
> Changes since V7:
> 
> * Now independent from
>   [PATCH v3 0/6]  ACPI: Only create NUMA nodes from entries in SRAT or SRAT emulation
> * Minor documentation tweak.
> * Rebase on v5.9-rc1
> 
> Changes since V6:
> 
> * Rebase on 5.8-rc4 + Dependency as above.
> * Drop the ARM64 specific code. No specific calls are needed on ARM64
>   as the generic node init is done for all nodes, whether or not they
>   have memory.  X86 does memoryless nodes separately from those with
>   memory and hence needs to specifically intialize GI only nodes.
> * Fix up an error in the docs reported by Brice Goglin who also did
>   quite a bit of testing of v5. Thanks!
>   
> Changes since V5:
> 
> 3 new patches:
> * A fix for a subtlety in how ACPI 6.3 changed part of the HMAT table.
> * Introduction of access1 class to represent characteristics between CPU
>   and memory, ingnoring GIs unlike access0 which includes them.
> * Docs to describe the new access0 class.
> 
> Note that I ran a number of test cases for the new class which are
> described at the end of this email.
> 
> Changes since V4:
> 
> At Rafael's suggestion:
> 
> Rebase on top of Dan William's Specific Purpose Memory series as that
> moves srat.c Original patches cherry-picked fine onto mmotm with Dan's
> patches applied.
> 
> Applies to mmotm-2019-09-25 +
> https://lore.kernel.org/linux-acpi/156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com/
> [PATCH v4 00/10] EFI Specific Purpose Memory Support
> (note there are some trivial conflicts to deal with when applying
> the SPM series).
> 
> Change since V3.
> * Rebase.
> 
> Changes since RFC V2.
> * RFC dropped as now we have x86 support, so the lack of guards in in the
>   ACPI code etc should now be fine.
>   * Added x86 support.  Note this has only been tested on QEMU as I don't have
>     a convenient x86 NUMA machine to play with.  Note that this fitted together
>       rather differently from arm64 so I'm particularly interested in feedback
>         on the two solutions.
> 
> Since RFC V1.
> * Fix incorrect interpretation of the ACPI entry noted by Keith Busch
> * Use the acpica headers definitions that are now in mmotm.
> 
> It's worth noting that, to safely put a given device in a GI node, may
> require changes to the existing drivers as it's not unusual to assume
> you have local memory or processor core. There may be further constraints
> not yet covered by this patch.
> 
> Original cover letter...
> 
> ACPI 6.3 introduced a new entity that can be part of a NUMA proximity domain.
> It may share such a domain with the existing options (memory, CPU etc) but it
> may also exist on it's own.
> 
> The intent is to allow the description of the NUMA properties (particularly
> via HMAT) of accelerators and other initiators of memory activity that are not
> the host processor running the operating system.
> 
> This patch set introduces 'just enough' to make them work for arm64 and x86.
> It should be trivial to support other architectures, I just don't suitable
> NUMA systems readily available to test.
> 
> There are a few quirks that need to be considered.
> 
> 1. Fall back nodes
> ******************
> 
> As pre ACPI 6.3 supporting operating systems do not have Generic Initiator
> Proximity Domains it is possible to specify, via _PXM in DSDT that another
> device is part of such a GI only node.  This currently blows up spectacularly.
> 
> Whilst we can obviously 'now' protect against such a situation (see the related
> thread on PCI _PXM support and the  threadripper board identified there as
> also falling into the  problem of using non existent nodes
> https://patchwork.kernel.org/patch/10723311/ ), there is no way to  be sure
> we will never have legacy OSes that are not protected  against this.  It would
> also be 'non ideal' to fallback to  a default node as there may be a better
> (non GI) node to pick  if GI nodes aren't available.
> 
> The work around is that we also have a new system wide OSC bit that allows
> an operating system to 'announce' that it supports Generic Initiators.  This
> allows, the firmware to us DSDT magic to 'move' devices between the nodes
> dependent on whether our new nodes are there or not.
> 
> 2. New ways of assigning a proximity domain for devices
> *******************************************************
> 
> Until now, the only way firmware could indicate that a particular device
> (outside the 'special' set of cpus etc) was to be found in a particular
> Proximity Domain by the use of _PXM in DSDT.
> 
> That is equally valid with GI domains, but we have new options. The SRAT
> affinity structure includes a handle (ACPI or PCI) to identify devices
> with the system and specify their proximity domain that way.  If both _PXM
> and this are provided, they should give the same answer.
> 
> For now this patch set completely ignores that feature as we don't need
> it to start the discussion.  It will form a follow up set at some point
> (if no one else fancies doing it).
> 
> Test cases for the access1 class
> ********************************
> 
> Test cases for Generic Initiator additions to HMAT.
> 
> Setup
> 
> PXM0 (node 0) - CPU0 CPU1, 2G memory
> PXM1 (node 1) - CPU2 CPU3, 2G memory
> PXM2 (node 2) - CPU4 CPU5, 2G memory
> PXM3 (node 4) - 2G memory (GI in one case below)
> PXM4 (node 3) - GI only.
> 
> Config 1:  GI in PXM4 nearer to memory in PXM 3 than CPUs, not direct attached
> 
> [    2.384064] acpi/hmat: HMAT: Locality: Flags:00 Type:Access Latency Initiator Domains:4 Target Domains:4 Base:256
> [    2.384913] acpi/hmat:   Initiator-Target[0-0]:1 nsec
> [    2.385190] acpi/hmat:   Initiator-Target[0-1]:9 nsec
> [    2.385736] acpi/hmat:   Initiator-Target[0-2]:9 nsec
> [    2.385984] acpi/hmat:   Initiator-Target[0-3]:9 nsec
> [    2.386447] acpi/hmat:   Initiator-Target[1-0]:9 nsec
> [    2.386740] acpi/hmat:   Initiator-Target[1-1]:1 nsec
> [    2.386964] acpi/hmat:   Initiator-Target[1-2]:9 nsec
> [    2.387174] acpi/hmat:   Initiator-Target[1-3]:9 nsec
> [    2.387624] acpi/hmat:   Initiator-Target[2-0]:9 nsec
> [    2.387953] acpi/hmat:   Initiator-Target[2-1]:9 nsec
> [    2.388155] acpi/hmat:   Initiator-Target[2-2]:1 nsec
> [    2.388607] acpi/hmat:   Initiator-Target[2-3]:9 nsec
> [    2.388861] acpi/hmat:   Initiator-Target[4-0]:13 nsec
> [    2.389126] acpi/hmat:   Initiator-Target[4-1]:13 nsec
> [    2.389574] acpi/hmat:   Initiator-Target[4-2]:13 nsec
> [    2.389805] acpi/hmat:   Initiator-Target[4-3]:5 nsec
> 
> # Sysfs reads the same for nodes 0-2 for access0 and access1 as no GI involved.
> 
> /sys/bus/node/devices/...
>     node0 #1 and 2 similar.
>         access0
>             initiators
>                 node0
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node0
>             uevent
>         access1
>             initiators
>                 node0
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node 0
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
>     node3 # Note PXM 4, contains GI only
>         access0
>             initiators
>                 *empty*
>             power
>             targets
>                 node4
>             uevent
>         compact
>         ...
>     node4
>         access0
>             initiators
>                 node3
>                 read_bandwidth  0
>                 read_latency    5
>                 write_bandwidth 0
>                 write_latency   5
>             power
>             targets
>                 *empty*
>             uevent
>         access1
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty*
>             uevent
>         compact
>         ...
> 
> Config 2:  GI in PXM4 further to memory in PXM 3 than CPUs, not direct attached
> 
> [    4.073493] acpi/hmat: HMAT: Locality: Flags:00 Type:Access Latency Initiator Domains:4 Target Domains:4 Base:256
> [    4.074785] acpi/hmat:   Initiator-Target[0-0]:1 nsec
> [    4.075150] acpi/hmat:   Initiator-Target[0-1]:9 nsec
> [    4.075423] acpi/hmat:   Initiator-Target[0-2]:9 nsec
> [    4.076184] acpi/hmat:   Initiator-Target[0-3]:9 nsec
> [    4.077116] acpi/hmat:   Initiator-Target[1-0]:9 nsec
> [    4.077366] acpi/hmat:   Initiator-Target[1-1]:1 nsec
> [    4.077640] acpi/hmat:   Initiator-Target[1-2]:9 nsec
> [    4.078156] acpi/hmat:   Initiator-Target[1-3]:9 nsec
> [    4.078471] acpi/hmat:   Initiator-Target[2-0]:9 nsec
> [    4.078994] acpi/hmat:   Initiator-Target[2-1]:9 nsec
> [    4.079277] acpi/hmat:   Initiator-Target[2-2]:1 nsec
> [    4.079505] acpi/hmat:   Initiator-Target[2-3]:9 nsec
> [    4.080126] acpi/hmat:   Initiator-Target[4-0]:13 nsec
> [    4.080995] acpi/hmat:   Initiator-Target[4-1]:13 nsec
> [    4.081351] acpi/hmat:   Initiator-Target[4-2]:13 nsec
> [    4.082125] acpi/hmat:   Initiator-Target[4-3]:13 nsec
> 
> /sys/bus/node/devices/...
>     node0 #1 and 2 similar.
>         access0
>             initiators
>                 node0
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         access1
>             initiators
>                 node0
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
>     node3 # Note PXM 4, contains GI only
>         #No accessX directories.
>         compact
>         ...
>     node4
>         access0
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty*
>             uevent
>         access1
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty*
>             uevent
>         compact
>         ...
> 
> 
> case 3 - as per case 2 but now the memory in node 3 is direct attached to the
> GI but nearer the main nodes (not physically sensible :))
> 
> /sys/bus/node/devices/...
>     node0 #1 and 2 similar.
>         access0
>             initiators
>                 node0
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         access1
>             initiators
>                 node0
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
>     node3 # Note PXM 4, contains GI only
>         access0
>             initiators
>                 *empty*
>             power
>             targets
>                 node4
>             uevent
>         compact
>         ...
>     node4
>         access0
>             initiators
>                 node3
>                 read_bandwidth  0
>                 read_latency    13
>                 write_bandwidth 0
>                 write_latency   13
>             power
>             targets
>                 *empty*
>             uevent
>         access1
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty*
>             uevent
>         compact
>         ...
> 
> Case 4 - nearer the GI, but direct attached to one of the CPUS.
> # Another bonkers one.
> 
> /sys/bus/node/devices/...
>     node0 #1 similar.
>         access0
>             initiators
>                 node0
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         access1
>             initiators
>                 node0
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node0
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
>     node2 # Direct attached to memory in node 3
>         access0
>             initiators
>                 node2
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node2
>                 node4 #direct attached
>             uevent
>         access1
>             initiators
>                 node2
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node2
>                 node4 #direct attached
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
> 
>     node3 # Note PXM 4, contains GI only
>         #No accessX directories.
>         compact
>         ...
>     node4
>         access0
>             initiators
>                 node3
>                 read_bandwidth  0
>                 read_latency    13
>                 write_bandwidth 0
>                 write_latency   13
>             power
>             targets
>                 *empty*
>             uevent
>         access1
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty*
>             uevent
>         compact
>         ...
> 
> case 5 memory and GI together in node 3 (added an extra GI to node 3)
> Note hmat should also reflect this extra initiator domain.
> 
> /sys/bus/node/devices/...
>     node0 #1 and 2 similar.
>         access0
>             initiators
>                 node0
>                 read_bandwidth  0 #not specificed in hmat
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node0
>                 node4
>             uevent
>         access1
>             initiators
>                 node0
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 read_bandwidth  1   
>             power
>             targets
>                 node0
>             uevent
>         compact
>         cpu0
>         cpu1
>         ...
>     node3 # Note PXM 3, contains GI only
>         #No accessX directories.
>         compact
>         ...
>     node4 # Now memory and GI.
>         access0
>             initiators
>                 node4
>                 read_bandwidth  0
>                 read_latency    1
>                 write_bandwidth 0
>                 write_latency   1
>             power
>             targets
>                 node4
>             uevent
>         access1
>             initiators
>                 node0
>                 node1
>                 node2
>                 read_bandwidth  0
>                 read_latency    9
>                 write_bandwidth 0
>                 write_latency   9
>             power
>             targets
>                 *empty* # as expected GI doesn't paticipate in access 1.
>             uevent
>         compact
>         ...
> 
> Jonathan Cameron (6):
>   ACPI: Support Generic Initiator only domains
>   x86: Support Generic Initiator only proximity domains
>   ACPI: Let ACPI know we support Generic Initiator Affinity Structures
>   ACPI: HMAT: Fix handling of changes from ACPI 6.2 to ACPI 6.3
>   node: Add access1 class to represent CPU to memory characteristics
>   docs: mm: numaperf.rst Add brief description for access class 1.
> 
>  Documentation/admin-guide/mm/numaperf.rst |  8 ++
>  arch/x86/include/asm/numa.h               |  2 +
>  arch/x86/kernel/setup.c                   |  1 +
>  arch/x86/mm/numa.c                        | 14 ++++
>  drivers/acpi/bus.c                        |  4 +
>  drivers/acpi/numa/hmat.c                  | 90 ++++++++++++++++++-----
>  drivers/acpi/numa/srat.c                  | 69 ++++++++++++++++-
>  drivers/base/node.c                       |  3 +
>  include/linux/acpi.h                      |  1 +
>  include/linux/nodemask.h                  |  1 +
>  10 files changed, 172 insertions(+), 21 deletions(-)
> 





More information about the linux-arm-kernel mailing list