[RFC PATCH v2 2/4] Documentation: arm64/arm: dt bindings for numa.

Hanjun Guo hanjun.guo at linaro.org
Wed Nov 26 01:12:49 PST 2014


On 2014-11-26 3:00, Arnd Bergmann wrote:
> On Tuesday 25 November 2014 08:15:47 Ganapatrao Kulkarni wrote:
>>> No, don't hardcode ARM specifics into a common binding either. I've looked
>>> at the ibm,associativity properties again, and I think we should just use
>>> those, they can cover all cases and are completely independent of the
>>> architecture. We should probably discuss about the property name though,
>>> as using the "ibm," prefix might not be the best idea.
>>
>> We have started with new proposal, since not got enough details how
>> ibm/ppc is managing the numa using dt.
>> there is no documentation and there is no power/PAPR spec for numa in
>> public domain and there are no single dt file in arch/powerpc which
>> describes the numa. if we get any one of these details, we can align
>> to powerpc implementation.
> 
> Basically the idea is to have an "ibm,associativity" property in each
> bus or device that is node specific, and this includes all CPUs and
> memory nodes. The property contains an array of 32-bit integers that
> count the resources. Take an example of a NUMA cluster of two machines
> with four sockets and four cores each (32 cores total), a memory
> channel on each socket and one PCI host per board that is connected
> at equal speed to each socket on the board.
> 
> The ibm,associativity property in each PCI host, CPU or memory device
> node consequently has an array of three (board, socket, core) integers:
> 
> 	memory at 0,0 {
> 		device_type = "memory";
> 		reg = <0x0 0x0  0x4 0x0;
> 		/* board 0, socket 0, no specific core */
> 		ibm,asssociativity = <0 0 0xffff>;
> 	};
> 
> 	memory at 4,0 {
> 		device_type = "memory";
> 		reg = <0x4 0x0  0x4 0x0>;
> 		/* board 0, socket 1, no specific core */
> 		ibm,asssociativity = <0 1 0xffff>; 
> 	};
> 
> 	...
> 
> 	memory at 1c,0 {
> 		device_type = "memory";
> 		reg = <0x1c 0x0  0x4 0x0>;
> 		/* board 0, socket 7, no specific core */
> 		ibm,asssociativity = <1 7 0xffff>; 
> 	};
> 
> 	cpus {
> 		#address-cells = <2>;
> 		#size-cells = <0>;
> 		cpu at 0 {
> 			device_type = "cpu";
> 			reg = <0 0>;
> 			/* board 0, socket 0, core 0*/
> 			ibm,asssociativity = <0 0 0>; 
> 		};
> 
> 		cpu at 1 {
> 			device_type = "cpu";
> 			reg = <0 0>;
> 			/* board 0, socket 0, core 0*/
> 			ibm,asssociativity = <0 0 0>; 
> 		};
> 
> 		...
> 
> 		cpu at 31 {
> 			device_type = "cpu";
> 			reg = <0 32>;
> 			/* board 1, socket 7, core 31*/
> 			ibm,asssociativity = <1 7 31>; 
> 		};
> 	};
> 
> 	pci at 100,0 {
> 		device_type = "pci";
> 		/* board 0 */
> 		ibm,associativity = <0 0xffff 0xffff>;
> 		...
> 	};
> 
> 	pci at 200,0 {
> 		device_type = "pci";
> 		/* board 1 */
> 		ibm,associativity = <1 0xffff 0xffff>;
> 		...
> 	};
> 
> 	ibm,associativity-reference-points = <0 1>;
> 
> The "ibm,associativity-reference-points" property here indicates that index 2
> of each array is the most important NUMA boundary for the particular system,
> because the performance impact of allocating memory on the remote board 
> is more significant than the impact of using memory on a remote socket of the
> same board. Linux will consequently use the first field in the array as
> the NUMA node ID. If the link between the boards however is relatively fast,
> so you care mostly about allocating memory on the same socket, but going to
> another board isn't much worse than going to another socket on the same
> board, this would be
> 
> 	ibm,associativity-reference-points = <1 0>;
> 
> so Linux would ignore the board ID and use the socket ID as the NUMA node
> number. The same would apply if you have only one (otherwise identical
> board, then you would get
> 
> 	ibm,associativity-reference-points = <1>;
> 
> which means that index 0 is completely irrelevant for NUMA considerations
> and you just care about the socket ID. In this case, devices on the PCI
> bus would also not care about NUMA policy and just allocate buffers from
> anywhere, while in original example Linux would allocate DMA buffers only
> from the local board.

Thanks for the detail information. I have the concerns about the distance
for NUMA nodes, does the "ibm,associativity-reference-points" property can
represent the distance between NUMA nodes?

For example, a system with 4 sockets connected like below:

Socket 0  <---->  Socket 1  <---->  Socket 2  <---->  Socket 3

So from socket 0 to socket 1 (maybe on the same board), it just need 1
jump to access the memory, but from socket 0 to socket 2/3, it needs
2/3 jumps and the *distance* relative longer. Can
"ibm,associativity-reference-points" property cover this?

Thanks
Hanjun




More information about the linux-arm-kernel mailing list