[PATCH] perf arm-spe: Add support for SPE Data Source packet on HiSilicon HIP12
Yicong Yang
yangyicong at huawei.com
Thu Apr 24 04:57:39 PDT 2025
Hi Leo,
On 2025/4/23 15:57, Yicong Yang wrote:
> On 2025/4/22 21:20, Leo Yan wrote:
>> On Tue, Apr 22, 2025 at 08:31:43PM +0800, Yicong Yang wrote:
>>
>> [...]
>>
>>>>>> + case ARM_SPE_HISI_HIP_PEER_CLUSTER:
>>>>>> + data_src->mem_lvl = PERF_MEM_LVL_REM_CCE1 | PERF_MEM_LVL_HIT;
>>>>>> + data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
>>>>
>>>> Seems to me, a CPU has L3 cache, would the cluster has a higher level's
>>>> cache?
>>>
>>> In my case, the cluster CPUs share the L3 cache and there's several clusters.
>>> L3's the highest level cache in the system.
>>
>> If so, you might need to revise the cache levels for:
>>
>> ARM_SPE_HISI_HIP_PEER_CPU
>> ARM_SPE_HISI_HIP_PEER_CPU_HITM
>>
>> IIUC, cluster CPUs share L3 cache, and every CPU in a cluster has
>> L1/L2 cache, for PEER_CPU cases, the memory level should be L2.
>>
>
> confirmed with our hardware people, should be L2 for these two data sources.
> I misunderstood here, thanks for pointing it out.
>
I recalled why the handling is like this. considering 2 threads have potential false
sharing issues which are running on core0 thread0 and core 1 thread 0 in the same
cluster, we'll have some ARM_SPE_HISI_HIP_PEER_CPU_HITM samples to indicate the
cacheline contention. If the cache level is L2 then we cannot observe this by
`perf c2c report -d tot`, since L2 is not counted for HITM.
does it make sense to have below change to account L2 hitm for lcl_hitm? just like
we account L2 peer snoop for lcl_peer.
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 884d9aebce91..a384a866a562 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -680,7 +680,10 @@ do { \
if (lvl & P(LVL, LFB)) stats->ld_fbhit++;
if (lvl & P(LVL, L1 )) stats->ld_l1hit++;
if (lvl & P(LVL, L2)) {
- stats->ld_l2hit++;
+ if (snoop & P(SNOOP, HITM))
+ HITM_INC(lcl_hitm);
+ else
+ stats->ld_l2hit++;
if (snoopx & P(SNOOPX, PEER))
PEER_INC(lcl_peer);
>> [...]
>>
>>>>>> + case ARM_SPE_HISI_HIP_REMOTE_SOCKET:
>>>>>> + data_src->mem_lvl = PERF_MEM_LVL_REM_CCE2;
>>>>>> + data_src->mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
>>>>>> + data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
>>>>>> + data_src->mem_snoopx = PERF_MEM_SNOOPX_PEER;
>>>>>
>>>>> Hi Yicong,
>>>>>
>>>>> Is the mem_snoop setting missing from this one?
>>>>
>>>> The field 'mem_snoopx' is an extension to the field 'mem_snoop'.
>>>>
>>>> If the field 'mem_snoopx' is set, it is no need to set 'mem_snoop'.
>>>>
>>>
>>> they should not be mutal exclusive. mem_snoopx provides the information where
>>> the cacheline comes from while mem_snoop provides the status of the cacheline.
>>> if hardware supports we can gather both information from the data source, like
>>> above for ARM_SPE_HISI_HIP_PEER_CLUSTER_HITM.
>>
>> My understanding is the PERF_MEM_SNOOPX_PEER flag was extended for
>> support Arm SPE. Other snoop flags were original from x86 arch.
>>
>> I agreed that in some cases above, both the flags PERF_MEM_SNOOPX_PEER
>> and PERF_MEM_SNOOP_HITM can be set together, you can parse cache sharing
>> with different --display options:
>>
>> perf c2c report --display tot => based on HITM flags
>> perf c2c report --display peer => based on SNOOPX_PEER flag
>>
>
> that's exactly what we want to support :)
>
>>> for other cases if there's mem_snoopx information I think mem_snoop can be dropped,
>>> this won't make differeces. Checked c2c_decode_stats(), only PERF_MEM_SNOOP_HIT and
>>> PERF_MEM_SNOOP_HITM is useful when summarizing c2c statistics.
>>
>> It is about how to present accurate results.
>>
>> E.g., for REMOTE_SOCKET type, it is hard to say the data from remote
>> DRAM or any level's cache. Since more hardware details are absent, this
>> is why I suggested not to set 'mem_snoop' for REMOTE_SOCKET.
>>
>
> this makes sense. will drop mem_snoop if no indications from the data source.
>
> Thanks.
>
> .
>
More information about the linux-arm-kernel
mailing list