[PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
Leo Yan
leo.yan at linaro.org
Sat May 21 23:15:33 PDT 2022
Hi Joe,
On Thu, May 19, 2022 at 11:16:53AM -0400, Joe Mario wrote:
[...]
> Hi Leo:
> Thanks for getting this working on ARM. I do have a few comments.
>
> I built and ran this on a ARM Neoverse-N1 system with 2 numa nodes.
>
> Comment 1:
> When I run "perf c2c report", the "Node" field is marked "N/A". It's supposed to show the numa node where the data address for the cacheline resides. That's important both to see what node hot data resides on and if that data is getting lots of cross-numa accesses.
Good catching. Will fix it.
> Comment 2:
> I'm assuming you're identifying the contended cachelines using the "peer" load response, which indicates the load was resolved from a "peer" cpu's cacheline. Please confirm.
Yeah, "peer" is ambiguous. AFAIK, "peer" load can come from:
- Local node which in peer CPU's cache (can be same cluster or peer cluster);
- Remove ndoe which in CPU's cache line, or even from *remote DRAM*.
> If that's true, is it possible to identify if that "peer" response was on the local or remote numa node?
Good point. Yes, we can do this. So far, the remote accesses are
accounted in the metric "rmt_hit", it should be same with the
remote peer load; but so far we have no a metric to account local
peer loads, it would be not hard to add metric "lcl_peer".
> I ask because being able to identify both local and remote HitM's on Intel X86_64 has been quite valuable. That's because remote HitM's are costly and because it helps the viewer see if they need to optimize their cpu affinity or what node their hot data resides on.
Thanks a lot for the info. This means at least I should refine the shared
cache line distribution pareto for remote peer access, will do some
experiment for the enhancement.
> Last Comment:
> There's a row in the Pareto table that has incorrect column alignment.
> Look at row 80 below in the truncated snipit of output. It has an extra field inserted in it at the beginning.
> I also show what the corrected output should look like.
>
> Incorrect row 80:
> 71 =================================================
> 72 Shared Cache Line Distribution Pareto
> 73 =================================================
> 74 #
> 75 # ----- HITM ----- Snoop ------- Store Refs ------ ------- CL --------
> 76 # RmtHitm LclHitm Peer L1 Hit L1 Miss N/A Off Node PA cnt Code address
> 77 # ....... ....... ....... ....... ....... ....... ..... .... ...... ..................
> 78 #
> 79 -------------------------------------------------------------------------------
> 80 0 0 0 4648 0 0 11572 0x422140
> 81 -------------------------------------------------------------------------------
> 82 0.00% 0.00% 0.00% 0.00% 0.00% 44.47% 0x0 N/A 0 0x400ce8
> 83 0.00% 0.00% 10.26% 0.00% 0.00% 0.00% 0x0 N/A 0 0x400e48
> 84 0.00% 0.00% 0.00% 0.00% 0.00% 55.53% 0x0 N/A 0 0x400e54
> 85 0.00% 0.00% 89.74% 0.00% 0.00% 0.00% 0x8 N/A 0 0x401038
>
>
> Corrected row 80:
> 71 =================================================
> 72 Shared Cache Line Distribution Pareto
> 73 =================================================
> 74 #
> 75 # ----- HITM ----- Snoop ------- Store Refs ----- ------- CL --------
> 76 # RmtHitm LclHitm Peer L1 Hit L1 Miss N/A Off Node PA cnt Code address
> 77 # ....... ....... ....... ....... ....... ...... ..... .... ...... ..................
> 78 #
> 79 -------------------------------------------------------------------------------
> 80 0 0 4648 0 0 11572 0x422140
> 81 -------------------------------------------------------------------------------
> 82 0.00% 0.00% 0.00% 0.00% 0.00% 44.47% 0x0 N/A 0 0x400ce8
> 83 0.00% 0.00% 10.26% 0.00% 0.00% 0.00% 0x0 N/A 0 0x400e48
> 84 0.00% 0.00% 0.00% 0.00% 0.00% 55.53% 0x0 N/A 0 0x400e54
> 85 0.00% 0.00% 89.74% 0.00% 0.00% 0.00% 0x8 N/A 0 0x401038
Hmm‥. At my side, I used below command to output pareto view, but I
cannot see the conlumn "CL", the conlumn "CL" is only shown for TUI
mode but not for the mode "--stdio". Could you share the method for
how to reproduce this issue?
$ ./perf c2c report -i perf.data.v3 -N
=================================================
Shared Cache Line Distribution Pareto
=================================================
#
# ----- HITM ----- Snoop ------- Store Refs ------ --------- Data address --------- --------------- cycles --------------- Total cpu Shared
# Num RmtHitm LclHitm Peer L1 Hit L1 Miss N/A Offset Node PA cnt Code address rmt hitm lcl hitm load peer records cnt Symbol Object Source:Line Node{cpus %peers %stores}
# ..... ....... ....... ....... ....... ....... ....... .................. .... ...... .................. ........ ........ ........ ........ ....... ........ ...................... ................. ...................... ....
#
-------------------------------------------------------------------------------
0 0 0 56183 0 0 26534 0x420180
-------------------------------------------------------------------------------
0.00% 0.00% 99.85% 0.00% 0.00% 0.00% 0x0 N/A 0 0x400bd0 0 0 1587 4034 188785 2 [.] 0x0000000000000bd0 false_sharing.exe false_sharing.exe[bd0] 0{ 1 87.4% n/a} 1{ 1 12.6% n/a}
0.00% 0.00% 0.00% 0.00% 0.00% 54.56% 0x0 N/A 0 0x400bd4 0 0 0 0 14476 2 [.] 0x0000000000000bd4 false_sharing.exe false_sharing.exe[bd4] 0{ 1 n/a 0.2%} 1{ 1 n/a 99.8%}
0.00% 0.00% 0.00% 0.00% 0.00% 45.44% 0x0 N/A 0 0x400bf8 0 0 0 0 12058 2 [.] 0x0000000000000bf8 false_sharing.exe false_sharing.exe[bf8] 0{ 1 n/a 70.3%} 1{ 1 n/a 29.7%}
0.00% 0.00% 0.15% 0.00% 0.00% 0.00% 0x20 N/A 0 0x400c64 0 0 2462 2451 4835 2 [.] 0x0000000000000c64 false_sharing.exe false_sharing.exe[c64] 0{ 1 11.9% n/a} 1{ 1 88.1% n/a}
-------------------------------------------------------------------------------
1 0 0 2571 0 0 69861 0x420100
-------------------------------------------------------------------------------
0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0x8 N/A 0 0x400c08 0 0 0 0 69861 2 [.] 0x0000000000000c08 false_sharing.exe false_sharing.exe[c08] 0{ 1 n/a 62.1%} 1{ 1 n/a 37.9%}
0.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0x20 N/A 0 0x400c74 0 0 834 641 6576 2 [.] 0x0000000000000c74 false_sharing.exe false_sharing.exe[c74] 0{ 1 93.2% n/a} 1{ 1 6.8% n/a}
Very appreciate your testing and suggestions!
Leo
More information about the linux-arm-kernel
mailing list