[PATCH] arm64: dts: allwinner: Add cache information to the SoC dtsi for H616

Dragan Simic dsimic at manjaro.org
Wed May 8 04:15:49 PDT 2024


Hello Andre,

On 2024-05-08 13:05, Andre Przywara wrote:
> On Fri,  3 May 2024 11:09:41 +0200
> Dragan Simic <dsimic at manjaro.org> wrote:
> 
>> Add missing cache information to the Allwinner H616 SoC dtsi, to allow
>> the userspace, which includes lscpu(1) that uses the virtual files 
>> provided
>> by the kernel under the /sys/devices/system/cpu directory, to display 
>> the
>> proper H616 cache information.
>> 
>> Adding the cache information to the H616 SoC dtsi also makes the 
>> following
>> warning message in the kernel log go away:
>> 
>>   cacheinfo: Unable to detect cache hierarchy for CPU 0
>> 
>> Rather conspicuously, almost no cache-related information is available 
>> in
>> the publicly available Allwinner H616 datasheet (version 1.0) and H616 
>> user
>> manual (version 1.0).  Thus, the cache parameters for the H616 SoC 
>> dtsi were
>> obtained and derived by hand from the cache size and layout 
>> specifications
>> found in the following technical reference manual, and from the cache 
>> size
>> and die revision hints available from the following community-provided 
>> data
>> and memory subsystem benchmarks:
>> 
>>   - ARM Cortex-A53 revision r0p4 TRM, version J
>>   - Summary of the two available H616 die revisions and their 
>> differences
>>     in cache sizes observed from the CSSIDR_EL1 register readouts, 
>> provided
>>     by Andre Przywara [1][2]
>>   - Tinymembench benchmark results of the H616-based OrangePi Zero 2 
>> SBC,
>>     provided by Thomas Kaiser [3]
>> 
>> For future reference, here's a brief summary of the available 
>> documentation
>> and the community-provided data and memory subsystem benchmarks:
>> 
>>   - All caches employ the 64-byte cache line length
>>   - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative 
>> instruction
>>     cache and 32 KB of L1 4-way, set-associative data cache
>>   - The size of the L2 cache depends on the actual H616 die revision 
>> (there
>>     are two die revisions), so the entire SoC can have either 256 KB 
>> or 1 MB
>>     of unified L2 16-way, set-associative cache [1]
>> 
>> Also for future reference, here's the relevant excerpt from the 
>> community-
>> provided H616 memory subsystem benchmark, [3] which confirms that 32 
>> KB and
>> 256 KB are the L1 data and L2 cache sizes, respectively:
>> 
>>     block size : single random read / dual random read
>>           1024 :    0.0 ns          /     0.0 ns
>>           2048 :    0.0 ns          /     0.0 ns
>>           4096 :    0.0 ns          /     0.0 ns
>>           8192 :    0.0 ns          /     0.0 ns
>>          16384 :    0.0 ns          /     0.0 ns
>>          32768 :    0.0 ns          /     0.0 ns
>>          65536 :    4.3 ns          /     7.3 ns
>>         131072 :    6.6 ns          /    10.5 ns
>>         262144 :    9.8 ns          /    15.2 ns
>>         524288 :   91.8 ns          /   142.9 ns
>>        1048576 :  138.6 ns          /   188.3 ns
>>        2097152 :  163.0 ns          /   204.8 ns
>>        4194304 :  178.8 ns          /   213.5 ns
>>        8388608 :  187.1 ns          /   217.9 ns
>>       16777216 :  192.2 ns          /   220.9 ns
>>       33554432 :  196.5 ns          /   224.0 ns
>>       67108864 :  215.7 ns          /   259.5 ns
> 
> Thanks for dumping the elaborate information here!

You're welcome! :)  I like when patch descriptions provide as much
relevant information as possible, so I always try to do that myself.

>> The changes introduced to the H616 SoC dtsi by this patch specify 256 
>> KB as
>> the L2 cache size.  As outlined by Andre Przywara, [2] a follow-up 
>> TF-A patch
>> will perform runtime adjustment of the device tree data, making the 
>> correct
>> L2 cache size of 1 MB present in the device tree for the boards based 
>> on the
>> revision of H616 that actually provides 1 MB of L2 cache.
> 
> I pushed that TF-A patch for review now:
> https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/28694/1
> On my OrangePi Zero3 (with an 1MB H618 SoC) the size and number of sets
> get adjusted to describe 1MB:
> => fdt list /cpus/l2-cache
> l2-cache {
>         compatible = "cache";
>         cache-level = <0x00000002>;
>         cache-unified;
>         cache-size = <0x00100000>;
>         cache-line-size = <0x00000040>;
>         cache-sets = <0x00000400>;
>         phandle = <0x00000003>;
> };

Awesome, thanks for the follow-up TF-A patch!  I'll keep an eye
on your TF-A patch submission.

>> [1] 
>> https://lore.kernel.org/linux-sunxi/20240430114627.0cfcd14a@donnerap.manchester.arm.com/
>> [2] 
>> https://lore.kernel.org/linux-sunxi/20240501103059.10a8f7de@donnerap.manchester.arm.com/
>> [3] 
>> https://raw.githubusercontent.com/ThomasKaiser/sbc-bench/master/results/4knM.txt
>> 
>> Suggested-by: Andre Przywara <andre.przywara at arm.com>
>> Helped-by: Andre Przywara <andre.przywara at arm.com>
>> Signed-off-by: Dragan Simic <dsimic at manjaro.org>
> 
> So I can confirm that the information above is correct, and also 
> matches
> the DT properties added below.
> 
> Reviewed-by: Andre Przywara <andre.przywara at arm.com>

Thanks!

>> ---
>>  .../arm64/boot/dts/allwinner/sun50i-h616.dtsi | 37 
>> +++++++++++++++++++
>>  1 file changed, 37 insertions(+)
>> 
>> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h616.dtsi 
>> b/arch/arm64/boot/dts/allwinner/sun50i-h616.dtsi
>> index b2e85e52d1a1..4faed88d8909 100644
>> --- a/arch/arm64/boot/dts/allwinner/sun50i-h616.dtsi
>> +++ b/arch/arm64/boot/dts/allwinner/sun50i-h616.dtsi
>> @@ -26,30 +26,67 @@ cpu0: cpu at 0 {
>>  			reg = <0>;
>>  			enable-method = "psci";
>>  			clocks = <&ccu CLK_CPUX>;
>> +			i-cache-size = <0x8000>;
>> +			i-cache-line-size = <64>;
>> +			i-cache-sets = <256>;
>> +			d-cache-size = <0x8000>;
>> +			d-cache-line-size = <64>;
>> +			d-cache-sets = <128>;
>> +			next-level-cache = <&l2_cache>;
>>  		};
>> 
>>  		cpu1: cpu at 1 {
>>  			compatible = "arm,cortex-a53";
>>  			device_type = "cpu";
>>  			reg = <1>;
>>  			enable-method = "psci";
>>  			clocks = <&ccu CLK_CPUX>;
>> +			i-cache-size = <0x8000>;
>> +			i-cache-line-size = <64>;
>> +			i-cache-sets = <256>;
>> +			d-cache-size = <0x8000>;
>> +			d-cache-line-size = <64>;
>> +			d-cache-sets = <128>;
>> +			next-level-cache = <&l2_cache>;
>>  		};
>> 
>>  		cpu2: cpu at 2 {
>>  			compatible = "arm,cortex-a53";
>>  			device_type = "cpu";
>>  			reg = <2>;
>>  			enable-method = "psci";
>>  			clocks = <&ccu CLK_CPUX>;
>> +			i-cache-size = <0x8000>;
>> +			i-cache-line-size = <64>;
>> +			i-cache-sets = <256>;
>> +			d-cache-size = <0x8000>;
>> +			d-cache-line-size = <64>;
>> +			d-cache-sets = <128>;
>> +			next-level-cache = <&l2_cache>;
>>  		};
>> 
>>  		cpu3: cpu at 3 {
>>  			compatible = "arm,cortex-a53";
>>  			device_type = "cpu";
>>  			reg = <3>;
>>  			enable-method = "psci";
>>  			clocks = <&ccu CLK_CPUX>;
>> +			i-cache-size = <0x8000>;
>> +			i-cache-line-size = <64>;
>> +			i-cache-sets = <256>;
>> +			d-cache-size = <0x8000>;
>> +			d-cache-line-size = <64>;
>> +			d-cache-sets = <128>;
>> +			next-level-cache = <&l2_cache>;
>> +		};
>> +
>> +		l2_cache: l2-cache {
>> +			compatible = "cache";
>> +			cache-level = <2>;
>> +			cache-unified;
>> +			cache-size = <0x40000>;
>> +			cache-line-size = <64>;
>> +			cache-sets = <256>;
>>  		};
>>  	};
>> 



More information about the linux-arm-kernel mailing list