[PATCH] arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)

Yassine Oudjana y.oudjana at protonmail.com
Tue Jul 6 10:15:53 PDT 2021


On Tuesday, July 6th, 2021 at 7:43 PM, Arnd Bergmann <arnd at arndb.de> wrote:
> On Tue, Jul 6, 2021 at 4:46 PM Marc Zyngier maz at kernel.org wrote:
> > On Tue, 06 Jul 2021 15:30:34 +0100, Arnd Bergmann arnd at arndb.de wrote:
> > > I can only speculate on how much got reused between the two, but
> > > as Falkor was released only after they had already given up on
> > > the full-custom Kryo core, it's plausible that it incorporates bits from
> > > that one. In particular the cache controller is probably easy to reuse
> > > even if the rest of it was a new design.
> >
> > I guess we'll never find out, and I'm probably one of the few still
> > having some access to this HW (not even sure for how long anyway).
> >
> > I won't cry if we decide to pull the plug on it.
>
> Sure, but the Snapdragon 820E is one we do need to worry about.
> While the internet pretty much agrees on Falkor having 128 bytes
> L1 cache line, it might be good to rule out that Kryo just misreports
> it before we revert the patch.
>
> Yassine, could you run the 'line' and 'cache' helper from lmbench
> to determine what the cache topology appears to be and if that
> matches the CTR_EL0 contents?
>
> Something like
>
> numactl -C 0 line -M 1M
> numactl -C 3 line -M 1M
> numactl -C 0 cache
> numactl -C 3 cache
>
> (the numactl command helps run this both on the 'big' and 'little'
> cores without running into migration)
>
> Arnd

Here are the results:

$ numactl -C 0 line -M 1M
128
$ numactl -C 3 line -M 1M
128
$ numactl -C 0 cache
L1 cache: 512 bytes 1.37 nanoseconds 64 linesize -1.00 parallelism
L2 cache: 24576 bytes 2.75 nanoseconds 64 linesize 5.06 parallelism
L3 cache: 131072 bytes 7.89 nanoseconds 64 linesize 3.85 parallelism
L4 cache: 524288 bytes 15.86 nanoseconds 128 linesize 3.48 parallelism
Memory latency: 145.93 nanoseconds 4.88 parallelism
$ numactl -C 3 cache
L1 cache: 24576 bytes 1.29 nanoseconds 64 linesize 5.00 parallelism
L2 cache: 1048576 bytes 8.60 nanoseconds 128 linesize 3.07 parallelism
Memory latency: 143.29 nanoseconds 5.37 parallelism




More information about the linux-arm-kernel mailing list