[RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.

Mark Rutland mark.rutland at arm.com
Mon Oct 6 04:26:40 PDT 2014


On Mon, Oct 06, 2014 at 06:14:36AM +0100, Ganapatrao Kulkarni wrote:
> Hi Mark,
> 
> On Fri, Oct 3, 2014 at 5:43 PM, Mark Rutland <mark.rutland at arm.com> wrote:
> > On Thu, Sep 25, 2014 at 10:03:59AM +0100, Ganapatrao Kulkarni wrote:
> >> Adding numa support for arm64 based platforms.
> >> This version creates numa mapping by parsing the dt table.
> >> cpu to node id mapping is derived from cluster_id as defined in cpu-map.
> >> memory to node id mapping is derived from nid property of memory node.
> >
> > [...]
> >
> >> +/*
> >> + * Too small node sizes may confuse the VM badly. Usually they
> >> + * result from BIOS bugs. So dont recognize nodes as standalone
> >> + * NUMA entities that have less than this amount of RAM listed:
> >> + */
> >> +#define NODE_MIN_SIZE (4*1024*1024)
> >
> > Why do these confuse the VM? what does BIOS have to do with arm64?
> sneaked in from x86, will remove this.
> >
> >> +
> >> +#define parent_node(node)      (node)
> >
> > Huh?
> for thunder, no hierarchy at numa nodes. shall i put under ifdef or
> separate header file?

I was confused by a node being its own parent, but that seems to be the
case elsewhere for parent_node() implementations. Please at least have a
comment that we're assuming a flat hierarchy (for now).

> >
> > [...]
> >
> >> @@ -168,6 +191,11 @@ void __init bootmem_init(void)
> >>         min = PFN_UP(memblock_start_of_DRAM());
> >>         max = PFN_DOWN(memblock_end_of_DRAM());
> >>
> >> +       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
> >> +       max_pfn = max_low_pfn = max;
> >> +
> >> +       if (IS_ENABLED(CONFIG_NUMA))
> >> +               arm64_numa_init();
> >
> > Is this function defined if !CONFIG_NUMA? Surely it must do nothing in
> > that case anyway?
> yes, if is not required, will remove it.
> >
> > [...]
> >
> >> +/*
> >> + *  Set the cpu to node and mem mapping
> >> + */
> >> +void numa_store_cpu_info(cpu)
> >> +{
> >> +       cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
> >> +       cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
> >> +       set_numa_node(cpu_to_node_map[cpu]);
> >> +       set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
> >> +}
> >
> > I don't like this. I think we need to be more explicit in the DT w.r.t.
> > the relationship between memory and the CPU hierarchy.
> >
> > I can imagine that we might end up with systems with multiple levels of
> > NUMA hierarchy (using MPIDR_EL1.Aff{3,2}), and I'd rather that we were
> > explcit as possible from the start w.r.t. the relationship between
> > memory and groups of CPUs such that we don't end up with multiple ways
> > of specifying said relationship, and horrible edge cases around implicit
> > definitions (e.g. the nid to cluster mapping).
> are you recomending to have explicit nid attribute to each cpu device node?

I am recommending that we make the relationship explicit. If anything,
using the cpu-map (with phandles) seems like a better approach.

> >> +/**
> >> + * dummy_numa_init - Fallback dummy NUMA init
> >> + *
> >> + * Used if there's no underlying NUMA architecture, NUMA initialization
> >> + * fails, or NUMA is disabled on the command line.
> >> + *
> >> + * Must online at least one node and add memory blocks that cover all
> >> + * allowed memory.  This function must not fail.
> >> + */
> >> +static int __init dummy_numa_init(void)
> >> +{
> >> +       pr_info("%s\n",
> >> +              numa_off ? "NUMA turned off" : "No NUMA configuration found");
> >
> > Why not print "NUMA turned off" in numa_setup?
> enters this function only when, numa is turned off or the DT/ACPI
> based numa init fails.

Sure. Moving the "NUMA turned off" print into numa_setup would mean you
could just print "Using dummy NUMA layout" or something to that effect
here -- the function has no need to care about the value of numa_off.

> >
> >> +       pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
> >> +              0LLU, PFN_PHYS(max_pfn) - 1);
> >> +
> >> +       node_set(0, numa_nodes_parsed);
> >> +       numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +/**
> >> + * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
> >> + */
> >> +int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
> >> +                                    int depth, void *data)
> >> +{
> >> +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> >> +       const __be32 *reg, *endp, *nid_prop;
> >> +       int l, nid;
> >> +
> >> +       /* We are scanning "memory" nodes only */
> >> +       if (type == NULL) {
> >> +               /*
> >> +                * The longtrail doesn't have a device_type on the
> >> +                * /memory node, so look for the node called /memory at 0.
> >> +                */
> >> +               if (depth != 1 || strcmp(uname, "memory at 0") != 0)
> >> +                       return 0;
> >
> > This has no place on arm64.
> i am not sure that we can move to driver/of, at this moment this is
> arm64 specific binding.

I meant the longtrail workaround, hence my comments on that below.

> > We limited to longtrail workaround in the core memory parsing to PPC32
> > only in commit b44aa25d20e2ef6b (of: Handle memory at 0 node on PPC32
> > only). This code doesn't need it enabled ever.
> >
> > Are you booting using UEFI? This isn't going to work when the memory map
> tried with bootwrapper, working on to boot from UEFI.
> > comes from UEFI and we have no memory nodes in the DTB.
> yes, there is issue with UEFI boot, since memory node is removed.
> i request UEFI stub dev-team to suggest the possible ways to address this.

I've Cc'd a few people who have worked on the stub and/or EFI memory map
stuff. It would be worth keeping them on Cc so as to keep them informed.

I believe that the EFI stub is doing the right thing by ensuring that
the EFI memory map is used, so this is just another configuration that
your binding has to consider.

Mark.



More information about the linux-arm-kernel mailing list