[PATCH v2 2/2] perf mem: Support HITM for when mem_lvl_num is used

German Gomez german.gomez at arm.com
Tue Mar 15 11:44:10 PDT 2022


On 14/03/2022 18:37, Ali Saidi wrote:
> Hi German and Leo,
>
> On   Mon, 14 Mar 2022 18:00:13 +0000, German Gomez wrote:
>> Hi Leo, Ali
>>
>> On 14/03/2022 06:33, Leo Yan wrote:
>>> On Sun, Mar 13, 2022 at 07:19:33PM +0000, Ali Saidi wrote:
>>>
>>> [...]
>>>
>>>>>>> +			if (lvl & P(LVL, L3) || lnum == P(LVLNUM, L4)) {
>>>>>> According to a comment in the previous patch, using L4 is specific to Neoverse, right?
>>>>>>
>>>>>> Maybe we need to distinguish the Neoverse case from the generic one here as well
>>>>>>
>>>>>> if (is_neoverse)
>>>>>> // treat L4 as llc
>>>>>> else
>>>>>> // treat L3 as llc
>>>>> I personally think it's not good idea to distinguish platforms in the decoding code.
>>>> I agree here. The more we talk about this, the more I'm wondering if we're
>>>> spending too much code solving a problem that doesn't exist. I know of no
>>>> Neoverse systems that actually have 4 cache levels, they all actually have three
>>>> even though it's technically possible to have four.  I have some doubts anyone
>>>> will actually build four levels of cache and perhaps the most prudent path here
>>>> is to assume only three levels (and adjust the previous patch) until someone 
>>>> actually produces a system with four levels instead of a lot of code that is
>>>> never actually exercised?
>>> I am not right person to say L4 cache is not implemented in Neoverse
>>> platforms; my guess for a "System cache" data source might be L3 or
>>> L4 and it is a implementation dependent.  Maybe German or Arm mates
>>> could confirm for this.
>> I had a look at the TRMs for the N1[1], V1[2] and N2[3] Neoverse cores
>> (specifically the LL_CACHE_RD pmu events). If we were to assign a number
>> to the system cache (assuming all caches are implemented):
>>
>> *For N1*, if L2 and L3 are implemented, system cache would follow at *L4*
> To date no one has built 4 level though. Everyone has only built three.

The N1SDP board advertises 4 levels (we use it regularly for testing perf patches)

| $ cat /sys/devices/system/cpu/cpu0/cache/index4/{level,shared_cpu_list}
| 4
| 0-3

Would it be a good idea to obtain the system cache level# from sysfs?

>> *For V1 and N2*, if L2 is implemented, system cache would follow at *L3*
>> (these don't seem to have the same/similar per-cluster L3 cache from the N1)
> And in the future they're not able to build >3. German and Leo if there aren't
> strong objections I think the best path forward is for me to respin these
> assuming only 3 levels and if someone builds 4 in a far-off-future we can always
> change the implementation then. Agreed?
>
> Thanks,
> Ali
>



More information about the linux-arm-kernel mailing list