[PATCH 2/2] perf arm-spe: Parse more SPE fields and store source

Fri Feb 11 20:19:27 PST 2022

Hi German, Ali,

On Fri, Feb 11, 2022 at 04:31:40PM +0000, German Gomez wrote:
> Hi Ali,
> 
> On 28/01/2022 21:02, Ali Saidi wrote:
> > Hi German,
> >
> > On 28/01/2022 19:20, German Gomez wrote:
> >> Hi Ali,
> >>
> >> [...]
> >>>  };
> >>>  
> >>>  enum arm_spe_op_type {
> >>>  	ARM_SPE_LD		= 1 << 0,
> >>>  	ARM_SPE_ST		= 1 << 1,
> >>> +	ARM_SPE_LDST_EXCL	= 1 << 2,
> >>> +	ARM_SPE_LDST_ATOMIC	= 1 << 3,
> >>> +	ARM_SPE_LDST_ACQREL	= 1 << 4,
> 
> Wondering if we can store this in perf_sample->flags. The values are
> defined in "util/event.h" (PERF_IP_*). Maybe we can extend it to allow
> doing "sample->flags = PERF_LDST_FLAG_LD | PERF_LDST_FLAG_ATOMIC" and
> such.
> 
> @Leo do you think that could work?

Let's step back a bit and divide the decoding flow into two parts:
backend and frontend.

For the backend part, we decode the SPE hardware trace data and
generate the SPE record in the file
util/arm-spe-decoder/arm-spe-decoder.c.  As we want to support
complete operation types, we can extend arm_spe_op_type as below:

enum arm_spe_op_type {
        /* First level operation type */
	ARM_SPE_OP_OTHER        = 1 << 0,
	ARM_SPE_OP_LDST		= 1 << 1,
	ARM_SPE_OP_BRANCH_ERET  = 1 << 2,

        /* Second level operation type for OTHER */
        ARM_SPE_OP_SVE_OTHER    = 1 << 16,
        ARM_SPE_OP_SVE_FP       = 1 << 17,
        ARM_SPE_OP_SVE_PRED     = 1 << 18,

        /* Second level operation type for LDST */
        ARM_SPE_OP_LD           = 1 << 16,
        ARM_SPE_OP_ST           = 1 << 17,
        ARM_SPE_OP_ATOMIC       = 1 << 18,
        ARM_SPE_OP_EXCL         = 1 << 19,
        ARM_SPE_OP_AR           = 1 << 20,
        ARM_SPE_OP_SIMD_FP      = 1 << 21,
        ARM_SPE_OP_GP_REG       = 1 << 22,
        ARM_SPE_OP_UNSPEC_REG   = 1 << 23,
        ARM_SPE_OP_NV_SYSREG    = 1 << 24,
        ARM_SPE_OP_SVE_PRED     = 1 << 25,
        ARM_SPE_OP_SVE_SG       = 1 << 26,

        /* Second level operation type for BRANCH_ERET */
        ARM_SPE_OP_BR_COND      = 1 << 16,
        ARM_SPE_OP_BR_INDIRECT  = 1 << 17,
};

IIUC, Ali suggested to directly reuse packet format to express
operation type and don't need to redefine the operation types like
above structure arm_spe_op_type.  I personally bias to convert the raw
packet format to more readable format, a benefit is this would give
us more readable code.

For the frontend, we need to convert Arm SPE record to samples.
We can explore two fields: sample::flags and sample::data_src, for
load/store related operations, I perfer we can fill more complete
info in field sample::data_src and extend the definitions for
perf_mem_data_src; for branch operations, we can fill sample::flags.

So I am just wandering if we can set the field
sample::data_src::mem_lock for atomic operations, like:

    data_src.mem_op   = PERF_MEM_OP_LOAD;
    data_src.mem_lock = PERF_MEM_LOCK_ATOMIC;

The field "mem_lock" is only two bits, we can consider to extend the
structure with an extra filed "mem_lock_ext" if it cannot meet our
requirement.

> >>> +	ARM_SPE_BR		= 1 << 5,
> >>> +	ARM_SPE_BR_COND		= 1 << 6,
> >>> +	ARM_SPE_BR_IND		= 1 << 7,
> 
> Seems like we can store BR_COND in the existing "branch-miss" event
> (--itrace=b) with:
> 
>   sample->flags = PERF_IP_FLAG_BRANCH;
>   sample->flags |= PERF_IP_FLAG_CONDITIONAL;
> and/or
>   sample->flags |= PERF_IP_FLAG_INDIRECT;
> 
> PERF_IP_FLAG_INDIRECT doesn't exist yet but we can probably add it.

Yes, for branch samples, this makes sense for me.

Thanks,
Leo