[PATCH v3 0/2] perf: arm-spe: Decode SPE source and use for perf c2c

German Gomez german.gomez at arm.com
Tue Mar 22 05:05:46 PDT 2022


Hi Ali, thank you for your patches

On 18/03/2022 19:59, Ali Saidi wrote:
> When synthesizing data from SPE, augment the type with source information
> for Arm Neoverse cores so we can detect situtions like cache line contention
> and transfers on Arm platforms. 
>
> This changes enables the expected behavior of perf c2c on a system with SPE where
> lines that are shared among multiple cores show up in perf c2c output. 
>
> These changes switch to use mem_lvl_num to encode the level information instead
> of mem_lvl which is being deprecated, but I haven't found other users of
> mem_lvl_num. 
>
> Changes in v3:
>   * Assume ther are only three levels of cache hierarchy
>   * Split the mem_lvl_num and HITM changes in c2c into two seperate patches
>
> Ali Saidi (3):
>   perf arm-spe: Use SPE data source for neoverse cores
>   perf mem: Support mem_lvl_num in c2c command
>   perf mem: Support HITM for when mem_lvl_num is any
>
>  .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
>  .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 ++
>  tools/perf/util/arm-spe.c                     | 109 +++++++++++++++---
>  tools/perf/util/mem-events.c                  |  20 +++-
>  4 files changed, 124 insertions(+), 18 deletions(-)
>

I tested on a Neoverse N1 system using the below commands and the output
looks either unchanged or improved compared to before. For example:

| $ perf mem record -e spe-ldst -a -- sleep 4
| $ perf mem report
|
| 1.39%             1  1263          L3 miss                   [k] 0xffffb9a34bda2088
| 0.58%             1  529           L1 miss                   [k] 0xffffb9a34bd3be7c
| 0.34%             1  310           N/A                       [k] 0xffffb9a34baf4d28
| 0.34%             1  309           N/A                       [k] 0xffffb9a34bb82844

... became:

| 1.39%             1  1263          RAM hit                   [k] 0xffffb9a34bda2088
| 0.58%             1  529           L2 hit                    [k] 0xffffb9a34bd3be7c
| 0.34%             1  310           L1 hit                    [k] 0xffffb9a34baf4d28
| 0.34%             1  309           L1 hit                    [k] 0xffffb9a34bb82844
                                                                      
Also some L3 misses are now labeled as "Any cache hit" with the Snoop 
bit set. For example:
                                                                      
| 0.37%             1  332           L3 miss                   [.] 0x0000aaaadf70a700    N/A

... became:                                                           

| 0.37%             1  332           Any cache hit             [.] 0x0000aaaadf70a700    HitM

Tested-by: German Gomez <german.gomez at arm.com>
Reviewed-by: German Gomez <german.gomez at arm.com>

Thanks,
German

(I didn't run on a non-Neoverse system but it doesn't look like any   
behaviour is changed for those)



More information about the linux-arm-kernel mailing list