[PATCH 2/2] perf arm-spe: Parse more SPE fields and store source

German Gomez german.gomez at arm.com
Fri Feb 25 04:40:38 PST 2022


On 22/02/2022 19:29, Ali Saidi wrote:
> Hi German & Yan,
>
> Sorry about the delay in responding.
>
>> Hi German, Ali,
>>
> [...]
>>>>>  };
>>>>>>  
>>>>>>  enum arm_spe_op_type {
>>>>>>  	ARM_SPE_LD		= 1 << 0,
>>>>>>  	ARM_SPE_ST		= 1 << 1,
>>>>>> +	ARM_SPE_LDST_EXCL	= 1 << 2,
>>>>>> +	ARM_SPE_LDST_ATOMIC	= 1 << 3,
>>>>>> +	ARM_SPE_LDST_ACQREL	= 1 << 4,
>>> Wondering if we can store this in perf_sample->flags. The values are
>>> defined in "util/event.h" (PERF_IP_*). Maybe we can extend it to allow
>>> doing "sample->flags = PERF_LDST_FLAG_LD | PERF_LDST_FLAG_ATOMIC" and
>>> such.
>>>
>>> @Leo do you think that could work?
>> Let's step back a bit and divide the decoding flow into two parts:
>> backend and frontend.
>>
>> For the backend part, we decode the SPE hardware trace data and
>> generate the SPE record in the file
>> util/arm-spe-decoder/arm-spe-decoder.c.  As we want to support
>> complete operation types, we can extend arm_spe_op_type as below:
>>
>> enum arm_spe_op_type {
>>        /* First level operation type */
>> 	ARM_SPE_OP_OTHER        = 1 << 0,
>> 	ARM_SPE_OP_LDST		= 1 << 1,
> [...]
>
> I'm OK with this approach, but perhaps instead the op type should
> just be the raw traces op-type and op-type-payload? Macros to decode
> this information are already present and extensively used in the text
> decoding of the packet. While it's a little bit harder than just picking
> a bit, the op_type is only used in a single place today outside of
> the existing textual script decoding and what would be this decoding.
> Do we forsee many more uses that would justify having to maintain

I wanted to include some of the sve/simd bits in the perf samples.

For that I would be using a few of these flags.

> the immediate format vs finding a way to unify arm_spe_pkt_desc_op_type
> to support both the text decoding and this?
>
> [...]
>> So I am just wandering if we can set the field
>> sample::data_src::mem_lock for atomic operations, like:
>>
>>    data_src.mem_op   = PERF_MEM_OP_LOAD;
>>    data_src.mem_lock = PERF_MEM_LOCK_ATOMIC;
>>
>> The field "mem_lock" is only two bits, we can consider to extend the
>> structure with an extra filed "mem_lock_ext" if it cannot meet our
>> requirement.
> These are for the LOCK instruction on x86. I don't know that we want to
> overload the meaning here. Minimally there is value in differentiating
> exclusives vs atomics.
>
>>>>>> +	ARM_SPE_BR		= 1 << 5,
>>>>>> +	ARM_SPE_BR_COND		= 1 << 6,
>>>>>> +	ARM_SPE_BR_IND		= 1 << 7,
>>> Seems like we can store BR_COND in the existing "branch-miss" event
>>> (--itrace=b) with:
>>>
>>> sample->flags = PERF_IP_FLAG_BRANCH;
>>> sample->flags |= PERF_IP_FLAG_CONDITIONAL;
>>> and/or
>>> sample->flags |= PERF_IP_FLAG_INDIRECT;
>>>
>>> PERF_IP_FLAG_INDIRECT doesn't exist yet but we can probably add it.
>> Yes, for branch samples, this makes sense for me.
> makes sense to me too.
>
> Ali
>



More information about the linux-arm-kernel mailing list