[PATCH] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6
Andre Przywara
andre.przywara at arm.com
Wed May 1 02:30:59 PDT 2024
On Tue, 30 Apr 2024 13:10:41 +0200
Dragan Simic <dsimic at manjaro.org> wrote:
> Hello Andre,
>
> On 2024-04-30 12:46, Andre Przywara wrote:
> > On Tue, 30 Apr 2024 02:01:42 +0200
> > Dragan Simic <dsimic at manjaro.org> wrote:
> >> On 2024-04-30 01:10, Andre Przywara wrote:
> >> > On Sun, 28 Apr 2024 13:40:36 +0200
> >> > Dragan Simic <dsimic at manjaro.org> wrote:
> >> >
> >> >> Add missing cache information to the Allwinner H6 SoC dtsi, to allow
> >> >> the userspace, which includes lscpu(1) that uses the virtual files
> >> >> provided
> >> >> by the kernel under the /sys/devices/system/cpu directory, to display
> >> >> the
> >> >> proper H6 cache information.
> >> >>
> >> >> Adding the cache information to the H6 SoC dtsi also makes the
> >> >> following
> >> >> warning message in the kernel log go away:
> >> >>
> >> >> cacheinfo: Unable to detect cache hierarchy for CPU 0
> >> >>
> >> >> The cache parameters for the H6 dtsi were obtained and partially
> >> >> derived
> >> >> by hand from the cache size and layout specifications found in the
> >> >> following
> >> >> datasheets and technical reference manuals:
> >> >>
> >> >> - Allwinner H6 V200 datasheet, version 1.1
> >> >> - ARM Cortex-A53 revision r0p3 TRM, version E
> >> >>
> >> >> For future reference, here's a brief summary of the documentation:
> >> >>
> >> >> - All caches employ the 64-byte cache line length
> >> >> - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative
> >> >> instruction
> >> >> cache and 32 KB of L1 4-way, set-associative data cache
> >> >> - The entire SoC has 512 KB of unified L2 16-way, set-associative
> >> >> cache
> >> >>
> >> >> Signed-off-by: Dragan Simic <dsimic at manjaro.org>
> >> >
> >> > I can confirm that the data below matches the manuals, but also the
> >> > decoding of the architectural cache type registers (CCSIDR_EL1):
> >> > L1D: 32 KB: 128 sets, 4 way associative, 64 bytes/line
> >> > L1I: 32 KB: 256 sets, 2 way associative, 64 bytes/line
> >> > L2: 512 KB: 512 sets, 16 way associative, 64 bytes/line
> >>
> >> Thank you very much for reviewing my patch in such a detailed way!
> >> It's good to know that the values in the Allwinner datasheets match
> >> with the observed reality, so to speak. :)
> >
> > YW, and yes, I like to double check things when it comes to Allwinner
> > documentation ;-) And it was comparably easy for this problem.
>
> Double checking is always good, IMHO. :)
>
> > Out of curiosity: what triggered that patch? Trying to get rid of false
> > warning/error messages?
>
> Yes, one of the motivators was to get rid of the false kernel warning,
> and the other was to have the cache information nicely available through
> lscpu(1). I already did the same for a few Rockchip SoCs, [1][2][3] so
> a couple of Allwinner SoCs were the next on my mental TODO list. :)
Thanks for doing this!
> > And do you plan to address the H616 as well? It's a bit more tricky
> > there,
> > since there are two die revisions out: one with 256(?)KB of L2, one
> > with
> > 1MB(!). We know how to tell them apart, so I could provide some TF-A
> > code
> > to patch that up in the DT. The kernel DT copy could go with 256KB
> > then.
>
> I have no boards based on the Allwinner H616, so it wasn't on my radar.
> Though, I'd be happy to prepare and submit a similar kernel patch for
> the H616, if you'd then take it further and submit a TF-A patch that
> fixes the DT according to the detected die revision? Did I understand
> the plan right?
Yes, that was the idea. I have a working version of that TF-A patch now,
just need to figure out some details about the best way to only build this
for the H616 port.
Neither the data sheet nor the user manual mention the cache sizes for the
H616, but I checked the CSSIDR_EL1 register readouts on both an old H616
and a new H618, and they confirm that the former has 256 KB L2, and the
latter 1MB. Also I ran tinymembench on two boards to confirm this,
community benchmarks results are available here:
https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md
The OrangePi Zero2 and OrangePi Zero3 are good examples, respectively.
Associativity and cache line size are dictated by the Arm Cortex cores,
and the L1I & L1D sizes are the same as in the other SoCs.
Cheers,
Andre
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=67a6a98575974416834c2294853b3814376a7ce7
> [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=8612169a05c5e979af033868b7a9b177e0f9fcdf
> [3]
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b72633ba5cfa932405832de25d0f0a11716903b4
More information about the linux-arm-kernel
mailing list