[PATCH 2/2] perf arm-spe: Parse more SPE fields and store source
German Gomez
german.gomez at arm.com
Mon Feb 21 12:41:43 PST 2022
Hi Leo, Ali,
On 12/02/2022 04:19, Leo Yan wrote:
> Hi German, Ali,
>
> On Fri, Feb 11, 2022 at 04:31:40PM +0000, German Gomez wrote:
>> Hi Ali,
>>
>> [...]
> Let's step back a bit and divide the decoding flow into two parts:
> backend and frontend.
Sorry for derailing the conversation.
(I made some additional comments on the generation of samples below)
> enum arm_spe_op_type {
> /* First level operation type */
> ARM_SPE_OP_OTHER = 1 << 0,
> ARM_SPE_OP_LDST = 1 << 1,
> ARM_SPE_OP_BRANCH_ERET = 1 << 2,
>
> /* Second level operation type for OTHER */
> ARM_SPE_OP_SVE_OTHER = 1 << 16,
> ARM_SPE_OP_SVE_FP = 1 << 17,
> ARM_SPE_OP_SVE_PRED = 1 << 18,
>
> /* Second level operation type for LDST */
> ARM_SPE_OP_LD = 1 << 16,
> ARM_SPE_OP_ST = 1 << 17,
> ARM_SPE_OP_ATOMIC = 1 << 18,
> ARM_SPE_OP_EXCL = 1 << 19,
> ARM_SPE_OP_AR = 1 << 20,
> ARM_SPE_OP_SIMD_FP = 1 << 21,
> ARM_SPE_OP_GP_REG = 1 << 22,
> ARM_SPE_OP_UNSPEC_REG = 1 << 23,
> ARM_SPE_OP_NV_SYSREG = 1 << 24,
> ARM_SPE_OP_SVE_PRED = 1 << 25,
> ARM_SPE_OP_SVE_SG = 1 << 26,
>
> /* Second level operation type for BRANCH_ERET */
> ARM_SPE_OP_BR_COND = 1 << 16,
> ARM_SPE_OP_BR_INDIRECT = 1 << 17,
> };
>
> IIUC, Ali suggested to directly reuse packet format to express
> operation type and don't need to redefine the operation types like
> above structure arm_spe_op_type. I personally bias to convert the raw
> packet format to more readable format, a benefit is this would give
> us more readable code.
I personally like this method as well
>
> For the frontend, we need to convert Arm SPE record to samples.
> We can explore two fields: sample::flags and sample::data_src, for
> load/store related operations, I perfer we can fill more complete
> info in field sample::data_src and extend the definitions for
> perf_mem_data_src; for branch operations, we can fill sample::flags.
>
> So I am just wandering if we can set the field
> sample::data_src::mem_lock for atomic operations, like:
>
> data_src.mem_op = PERF_MEM_OP_LOAD;
> data_src.mem_lock = PERF_MEM_LOCK_ATOMIC;
>
> The field "mem_lock" is only two bits, we can consider to extend the
> structure with an extra filed "mem_lock_ext" if it cannot meet our
> requirement.
Indeed it makes more sense to use data_src as much as possible. Thanks
for pointing that out.
Some comments:
# ARM_SPE_OP_ATOMIC
This might be a hack, but can we not represent it as both LD&SR as the
atomic op would combine both?
data_src.mem_op = PERF_MEM_OP_LOAD | PERF_MEM_OP_STORE;
# ARM_SPE_OP_EXCL (instructions ldxr/stxr)
x86 doesn't seem to have similar instructions with similar semantics
(please correct me if I'm wrong). For this arch, PERF_MEM_LOCK_LOCK
probably suffices.
PPC seems to have similar instructions to arm64 (lwarx/stwcx). I don't
know if they also have instructions with same semantics as x86.
I think it makes sense to have a PERF_MEM_LOCK_EXCL. If not, reusing
PERF_MEM_LOCK_LOCK is the quicker alternative.
# ARM_SPE_OP_SVE_SG
(I'm sorry if this is too far out of scope of the original patch. Let
me know if you would prefer to discuss it on a separate channel)
On a separate note, I'm also looking at incorporating some of the SVE
bits in the perf samples.
For this, do you think it makes sense to have two mem_* categories in
perf_mem_data_src:
mem_vector (2 bits)
- simd
- other (SVE in arm64)
mem_src (1 bit)
- sparse (scatter/gather loads/stores in SVE, as well as simd)
---
Thanks,
German
>>>>> + ARM_SPE_BR = 1 << 5,
>>>>> + ARM_SPE_BR_COND = 1 << 6,
>>>>> + ARM_SPE_BR_IND = 1 << 7,
>> Seems like we can store BR_COND in the existing "branch-miss" event
>> (--itrace=b) with:
>>
>> sample->flags = PERF_IP_FLAG_BRANCH;
>> sample->flags |= PERF_IP_FLAG_CONDITIONAL;
>> and/or
>> sample->flags |= PERF_IP_FLAG_INDIRECT;
>>
>> PERF_IP_FLAG_INDIRECT doesn't exist yet but we can probably add it.
> Yes, for branch samples, this makes sense for me.
>
> Thanks,
> Leo
More information about the linux-arm-kernel
mailing list