[PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c

Leo Yan leo.yan at linaro.org
Sat May 21 23:15:33 PDT 2022


Hi Joe,

On Thu, May 19, 2022 at 11:16:53AM -0400, Joe Mario wrote:

[...]

> Hi Leo:
> Thanks for getting this working on ARM.  I do have a few comments.
> 
> I built and ran this on a ARM Neoverse-N1 system with 2 numa nodes.  
> 
> Comment 1:
> When I run "perf c2c report", the "Node" field is marked "N/A".  It's supposed to show the numa node where the data address for the cacheline resides.  That's important both to see what node hot data resides on and if that data is getting lots of cross-numa accesses. 

Good catching.  Will fix it.

> Comment 2:
> I'm assuming you're identifying the contended cachelines using the "peer" load response, which indicates the load was resolved from a "peer" cpu's cacheline.  Please confirm.

Yeah, "peer" is ambiguous.  AFAIK, "peer" load can come from:
- Local node which in peer CPU's cache (can be same cluster or peer cluster);
- Remove ndoe which in CPU's cache line, or even from *remote DRAM*.

> If that's true, is it possible to identify if that "peer" response was on the local or remote numa node?  

Good point.  Yes, we can do this.  So far, the remote accesses are
accounted in the metric "rmt_hit", it should be same with the
remote peer load; but so far we have no a metric to account local
peer loads, it would be not hard to add metric "lcl_peer".

> I ask because being able to identify both local and remote HitM's on Intel X86_64 has been quite valuable.  That's because remote HitM's are costly and because it helps the viewer see if they need to optimize their cpu affinity or what node their hot data resides on.

Thanks a lot for the info.  This means at least I should refine the shared
cache line distribution pareto for remote peer access, will do some
experiment for the enhancement.

> Last Comment:
> There's a row in the Pareto table that has incorrect column alignment.
> Look at row 80 below in the truncated snipit of output.  It has an extra field inserted in it at the beginning.
> I also show what the corrected output should look like.
> 
> Incorrect row 80:
>     71	=================================================
>     72	      Shared Cache Line Distribution Pareto      
>     73	=================================================
>     74	#
>     75	# ----- HITM -----    Snoop  ------- Store Refs ------  ------- CL --------                      
>     76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A    Off  Node  PA cnt        Code address
>     77	# .......  .......  .......  .......  .......  .......  .....  ....  ......  ..................
>     78	#
>     79	  -------------------------------------------------------------------------------
>     80	      0        0        0     4648        0        0    11572            0x422140
>     81	  -------------------------------------------------------------------------------
>     82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
>     83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
>     84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
>     85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038
> 
> 
> Corrected row 80:
>     71	=================================================
>     72	      Shared Cache Line Distribution Pareto      
>     73	=================================================
>     74	#
>     75	# ----- HITM -----    Snoop  ------- Store Refs -----   ------- CL --------                       
>     76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss     N/A     Off  Node  PA cnt        Code address
>     77	# .......  .......  .......  .......  .......  ......   .....  ....  ......  ..................
>     78	#
>     79	  -------------------------------------------------------------------------------
>     80	       0        0     4648        0        0    11572            0x422140
>     81	  -------------------------------------------------------------------------------
>     82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
>     83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
>     84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
>     85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038

Hmm‥.  At my side, I used below command to output pareto view, but I
cannot see the conlumn "CL", the conlumn "CL" is only shown for TUI
mode but not for the mode "--stdio".  Could you share the method for
how to reproduce this issue?

$ ./perf c2c report -i perf.data.v3 -N

=================================================
      Shared Cache Line Distribution Pareto      
=================================================
#
#        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                     Shared                              
#   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol             Object             Source:Line  Node{cpus %peers %stores}
# .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  .................  ......................  ....
#
  -------------------------------------------------------------------------------
      0        0        0    56183        0        0    26534            0x420180
  -------------------------------------------------------------------------------
           0.00%    0.00%   99.85%    0.00%    0.00%    0.00%                 0x0   N/A       0            0x400bd0         0         0      1587      4034   188785         2  [.] 0x0000000000000bd0  false_sharing.exe  false_sharing.exe[bd0]   0{ 1  87.4%    n/a}  1{ 1  12.6%    n/a}
           0.00%    0.00%    0.00%    0.00%    0.00%   54.56%                 0x0   N/A       0            0x400bd4         0         0         0         0    14476         2  [.] 0x0000000000000bd4  false_sharing.exe  false_sharing.exe[bd4]   0{ 1    n/a   0.2%}  1{ 1    n/a  99.8%}
           0.00%    0.00%    0.00%    0.00%    0.00%   45.44%                 0x0   N/A       0            0x400bf8         0         0         0         0    12058         2  [.] 0x0000000000000bf8  false_sharing.exe  false_sharing.exe[bf8]   0{ 1    n/a  70.3%}  1{ 1    n/a  29.7%}
           0.00%    0.00%    0.15%    0.00%    0.00%    0.00%                0x20   N/A       0            0x400c64         0         0      2462      2451     4835         2  [.] 0x0000000000000c64  false_sharing.exe  false_sharing.exe[c64]   0{ 1  11.9%    n/a}  1{ 1  88.1%    n/a}

  -------------------------------------------------------------------------------
      1        0        0     2571        0        0    69861            0x420100
  -------------------------------------------------------------------------------
           0.00%    0.00%    0.00%    0.00%    0.00%  100.00%                 0x8   N/A       0            0x400c08         0         0         0         0    69861         2  [.] 0x0000000000000c08  false_sharing.exe  false_sharing.exe[c08]   0{ 1    n/a  62.1%}  1{ 1    n/a  37.9%}
           0.00%    0.00%  100.00%    0.00%    0.00%    0.00%                0x20   N/A       0            0x400c74         0         0       834       641     6576         2  [.] 0x0000000000000c74  false_sharing.exe  false_sharing.exe[c74]   0{ 1  93.2%    n/a}  1{ 1   6.8%    n/a}

Very appreciate your testing and suggestions!

Leo



More information about the linux-arm-kernel mailing list