[PATCH 2/2] perf arm-spe: Parse more SPE fields and store source

Ali Saidi alisaidi at amazon.com
Tue Feb 22 11:29:43 PST 2022


Hi German & Yan,

Sorry about the delay in responding.

>Hi German, Ali,
>
[...]
> >>>  };
>> >>>  
>> >>>  enum arm_spe_op_type {
>> >>>  	ARM_SPE_LD		= 1 << 0,
>> >>>  	ARM_SPE_ST		= 1 << 1,
>> >>> +	ARM_SPE_LDST_EXCL	= 1 << 2,
>> >>> +	ARM_SPE_LDST_ATOMIC	= 1 << 3,
>> >>> +	ARM_SPE_LDST_ACQREL	= 1 << 4,
>> 
>> Wondering if we can store this in perf_sample->flags. The values are
>> defined in "util/event.h" (PERF_IP_*). Maybe we can extend it to allow
>> doing "sample->flags = PERF_LDST_FLAG_LD | PERF_LDST_FLAG_ATOMIC" and
>> such.
>> 
>> @Leo do you think that could work?
>
>Let's step back a bit and divide the decoding flow into two parts:
>backend and frontend.
>
>For the backend part, we decode the SPE hardware trace data and
>generate the SPE record in the file
>util/arm-spe-decoder/arm-spe-decoder.c.  As we want to support
>complete operation types, we can extend arm_spe_op_type as below:
>
>enum arm_spe_op_type {
>        /* First level operation type */
>	ARM_SPE_OP_OTHER        = 1 << 0,
>	ARM_SPE_OP_LDST		= 1 << 1,
[...]

I'm OK with this approach, but perhaps instead the op type should
just be the raw traces op-type and op-type-payload? Macros to decode
this information are already present and extensively used in the text
decoding of the packet. While it's a little bit harder than just picking
a bit, the op_type is only used in a single place today outside of
the existing textual script decoding and what would be this decoding.
Do we forsee many more uses that would justify having to maintain
the immediate format vs finding a way to unify arm_spe_pkt_desc_op_type
to support both the text decoding and this?

[...]
>So I am just wandering if we can set the field
>sample::data_src::mem_lock for atomic operations, like:
>
>    data_src.mem_op   = PERF_MEM_OP_LOAD;
>    data_src.mem_lock = PERF_MEM_LOCK_ATOMIC;
>
>The field "mem_lock" is only two bits, we can consider to extend the
>structure with an extra filed "mem_lock_ext" if it cannot meet our
>requirement.

These are for the LOCK instruction on x86. I don't know that we want to
overload the meaning here. Minimally there is value in differentiating
exclusives vs atomics.

>
>> >>> +	ARM_SPE_BR		= 1 << 5,
>> >>> +	ARM_SPE_BR_COND		= 1 << 6,
>> >>> +	ARM_SPE_BR_IND		= 1 << 7,
>> 
>> Seems like we can store BR_COND in the existing "branch-miss" event
>> (--itrace=b) with:
>> 
>>   sample->flags = PERF_IP_FLAG_BRANCH;
>>   sample->flags |= PERF_IP_FLAG_CONDITIONAL;
>> and/or
>>   sample->flags |= PERF_IP_FLAG_INDIRECT;
>> 
>> PERF_IP_FLAG_INDIRECT doesn't exist yet but we can probably add it.
>
>Yes, for branch samples, this makes sense for me.

makes sense to me too.

Ali




More information about the linux-arm-kernel mailing list