[PATCH] perf arm-spe: Add support for SPE Data Source packet on HiSilicon HIP12

Yicong Yang yangyicong at huawei.com
Wed Apr 23 00:57:52 PDT 2025


On 2025/4/22 21:20, Leo Yan wrote:
> On Tue, Apr 22, 2025 at 08:31:43PM +0800, Yicong Yang wrote:
> 
> [...]
> 
>>>>> +	case ARM_SPE_HISI_HIP_PEER_CLUSTER:
>>>>> +		data_src->mem_lvl = PERF_MEM_LVL_REM_CCE1 | PERF_MEM_LVL_HIT;
>>>>> +		data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
>>>
>>> Seems to me, a CPU has L3 cache, would the cluster has a higher level's
>>> cache?
>>
>> In my case, the cluster CPUs share the L3 cache and there's several clusters.
>> L3's the highest level cache in the system.
> 
> If so, you might need to revise the cache levels for:
> 
>   ARM_SPE_HISI_HIP_PEER_CPU
>   ARM_SPE_HISI_HIP_PEER_CPU_HITM
> 
> IIUC, cluster CPUs share L3 cache, and every CPU in a cluster has
> L1/L2 cache, for PEER_CPU cases, the memory level should be L2.
> 

confirmed with our hardware people, should be L2 for these two data sources.
I misunderstood here, thanks for pointing it out.

> [...]
> 
>>>>> +	case ARM_SPE_HISI_HIP_REMOTE_SOCKET:
>>>>> +		data_src->mem_lvl = PERF_MEM_LVL_REM_CCE2;
>>>>> +		data_src->mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
>>>>> +		data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
>>>>> +		data_src->mem_snoopx = PERF_MEM_SNOOPX_PEER;
>>>>
>>>> Hi Yicong,
>>>>
>>>> Is the mem_snoop setting missing from this one?
>>>
>>> The field 'mem_snoopx' is an extension to the field 'mem_snoop'.
>>>
>>> If the field 'mem_snoopx' is set, it is no need to set 'mem_snoop'.
>>>
>>
>> they should not be mutal exclusive. mem_snoopx provides the information where
>> the cacheline comes from while mem_snoop provides the status of the cacheline.
>> if hardware supports we can gather both information from the data source, like
>> above for ARM_SPE_HISI_HIP_PEER_CLUSTER_HITM.
> 
> My understanding is the PERF_MEM_SNOOPX_PEER flag was extended for
> support Arm SPE.  Other snoop flags were original from x86 arch.
> 
> I agreed that in some cases above, both the flags PERF_MEM_SNOOPX_PEER
> and PERF_MEM_SNOOP_HITM can be set together, you can parse cache sharing
> with different --display options:
> 
>   perf c2c report --display tot    => based on HITM flags
>   perf c2c report --display peer   => based on SNOOPX_PEER flag
> 

that's exactly what we want to support :)

>> for other cases if there's mem_snoopx information I think mem_snoop can be dropped,
>> this won't make differeces. Checked c2c_decode_stats(), only PERF_MEM_SNOOP_HIT and
>> PERF_MEM_SNOOP_HITM is useful when summarizing c2c statistics.
> 
> It is about how to present accurate results.
> 
> E.g., for REMOTE_SOCKET type, it is hard to say the data from remote
> DRAM or any level's cache.  Since more hardware details are absent, this
> is why I suggested not to set 'mem_snoop' for REMOTE_SOCKET.
> 

this makes sense. will drop mem_snoop if no indications from the data source.

Thanks.




More information about the linux-arm-kernel mailing list