[PATCH v2 2/5] perf jevents: Match on highest version of Arm json file available

James Clark james.clark at arm.com
Wed Jul 12 07:15:15 PDT 2023



On 12/07/2023 15:06, James Clark wrote:
> 
> 
> On 12/07/2023 12:33, John Garry wrote:
>> On 12/07/2023 12:00, James Clark wrote:
>>>
>>>
>>> On 12/07/2023 10:22, John Garry wrote:
>>>> On 11/07/2023 11:18, James Clark wrote:
>>>>>>> The highest valid version of json files should be used,
>>>>>> What exactly does that mean?
>>>>>>
>>>>>> So it seems that you have CPUs with matching MIDR except variant and
>>>>>> revision, but have different events, right?
>>>>> Yes. In this case we changed how a metric is calculated in N2-r0p3
>>>>> because the workaround that was needed for r0p0 is no longer needed
>>>>> (CPU_CYCLES should not subtracted from stalls for topdown metrics
>>>>> anymore).
>>>>
>>>> If there are only very subtle differences, then could we solve with
>>>> metric expression function? We already do something else like this for
>>>> literal #slots for arm64
>>>
>>> Possibly, but I'd have to add a new mechanism to expose the version
>>> numbers to the metrics so it can be used in an expression. And I think
>>> it could get very messy quite quickly and make the expression hard to
>>> read. In fact in this specific case I'm not even sure how it would look.
>>> Something like this (where #p is the p in r0p3)?
>>>
>>>   {
>>>   "ArchStdEvent": "bad_speculation",
>>>   "MetricExpr": "(100 * (((1 - (OP_RETIRED / OP_SPEC)) * (1 -
>>> ((STALL_SLOT - (CPU_CYCLES if (#p >= 3) else 0) / (CPU_CYCLES *
>>> #slots)))) + ((BR_MIS_PRED * 4) / CPU_CYCLES)))"
>>
>> Could this formula for "CPU_CYCLES if (#p >= 3) else 0" be generalised
>> with a literal (like what we do for #slots)?
>>
> 
> Yes that could work, then we wouldn't need the logic inside the formula.
> 
>>>   },
>>>
>>> In this case it's not so bad because we don't need to compare r as well,
>>> but I think it could get worse if r was non 0 or there were more
>>> significant differences to the metric.
>>>
>>> It also doesn't scale in the same way when we're using auto generated
>>> metrics with the data that Arm is publishing. We're going to treat CPU
>>> versions as a completely new set of JSONs if there are any
>>> differences/fixes in the PMU stuff and just publish a whole new set.
>>> That way any tool that's consuming these can just check the MIDR and use
>>> whichever JSONS are most appropriate.
>>
>> So far we have got away with ignoring revision and variant - why change
>> this now just for these arm parts is a good question..
>>
>>> In theory this shouldn't require
>>> any manual intervention or re-writing of expressions. And it saves every
>>> tool having to solve the same problem.
>>
>> Sure, but we have tried to reduce duplication as much as possible, and
>> example would be ArchStdEvents
>>
> 
> What's the end goal of reducing duplication? Is it to reduce the binary
> size or to reduce human input?
> 
> For the binary size if most of these strings are the same then they are
> de-duplicated and the impact on the binary size is very small.
> 
> If it's to save on human input, then not doing it this way has the
> potential to be more work in the long run. At the moment we can just run
> a script and generate the Arm JSONs using what Arm is publishing. We
> might understand all these subtle de-duplication efforts, but anyone
> coming in the future is going to have a hard time understanding.
> Especially if they just want to re-run the script to update some text,
> then there is a big risk of making a mistake when re-applying the manual
> changes.
> 
> I'm not completely opposed to the idea of re-working this expression
> this time so that it works in both places, but I think we should be sure
> that we do it for the right reasons.
> 
> As an example for comparison, I tried a build with a single set of
> events and both sets (from this change) and the difference in size is
> 1760 bytes. Is it really worth starting to edit formulas just to save
> that much space?
> 

I re-ran the test with a clean build and the difference is actually 4888
bytes. Still quite small but not as suspiciously small as 1760.

>>>
>>> It's possible in the future that there are also small changes to the
>>> description text or availability of the events between versions. It
>>> hasn't happened yet, but it gives us the flexibility to do that too.
>>
>> Here the (handful of) metrics are subtly different between these
>> rev/variant, but all the events are the same. It's hard to justify
>> duplicating almost everything just for that.
>>
> 
> If it saves time and reduces the chance of making mistakes then
> personally I don't see an issue with the duplication.
> 
>>>
>>
>> I am not sure on how to handle this, since we might be bombarded with
>> support for more CPUs with the same MIDR issue and having quirks for all
>> becomes unworkable.
>>
> 
> I really don't see this being used very often. And it will always be
> used much less often than the task of adding entirely new CPUs, so it
> won't cause the number of JSONs to grow faster than that already
> existing work. To be honest there is a non-zero chance that it never
> gets used again.
> 
> Although even if we never use it again we still might want to
> re-generate the JSONS because of a typo fix or something so that task is
> just a script run rather than script run, and then re-applying the
> expression fixes.
> 
>> Can you just give the idea above on literals/functions in the metricexpr
>> a go, to see how it looks? Maybe Ian has a good idea on similar solutions.
>>
>> Thanks,
>> John
> 
> Yes I can give it a go. Also interested to see what Ian thinks as well.
> 
> Thanks
> James



More information about the linux-arm-kernel mailing list