[PATCH 2/2] perf arm-spe: Parse more SPE fields and store source
Leo Yan
leo.yan at linaro.org
Sun Feb 27 05:20:19 PST 2022
On Mon, Feb 21, 2022 at 08:41:43PM +0000, German Gomez wrote:
[...]
> Some comments:
>
> # ARM_SPE_OP_ATOMIC
>
> This might be a hack, but can we not represent it as both LD&SR as the
> atomic op would combine both?
>
> data_src.mem_op = PERF_MEM_OP_LOAD | PERF_MEM_OP_STORE;
BTH, I don't understand well for this question, but let me explain a
bit:
We cannot use 'LOAD | STORE' to present the atomic operation. Please
see Armv8 ARM section D10.2.7 Operation Type packet, it would give out
more details. Atomic operation is an extra attribution for a load or
store operations, it could be an atomic load or store, or
load-acquire/store-release instructions, or
load-exclusive/store-exclusive instructions.
The function arm_spe_pkt_desc_op_type() in perf would also give more
info about atomic operations.
> # ARM_SPE_OP_EXCL (instructions ldxr/stxr)
>
> x86 doesn't seem to have similar instructions with similar semantics
> (please correct me if I'm wrong). For this arch, PERF_MEM_LOCK_LOCK
> probably suffices.
>
> PPC seems to have similar instructions to arm64 (lwarx/stwcx). I don't
> know if they also have instructions with same semantics as x86.
>
> I think it makes sense to have a PERF_MEM_LOCK_EXCL. If not, reusing
> PERF_MEM_LOCK_LOCK is the quicker alternative.
On Arm archs, I think OP_EXCL means Load-Exclusive and Store-Exclusive
instructions. Different archs have different memory model and
different atomic instructions, e.g. on Armv7, we uses Load-Exclusive and
Store-Exclusive instructions for spinlock and on Armv8 we uses
Load-Acquire and Store-Release instructions for spinlock.
I have no any knowledge for x86 and PPC archs. Seems to me, x86 uses
compare-and-swap instruction and PPC's lwarx/stwcx instructions "are
primitive, or simple, instructions used to perform a read-modify-write
operation to storage" [1].
So I personally think we can define PERF_MEM_LOCK_EXCL type for Arm
arches and fill into the field perf_mem_data_src::lock:
data_src.lock = PERF_MEM_LOCK_EXCL;
... or we can consider to introduce a field perf_mem_data_src::atomic
and fill a new type PERF_MEM_ATOMIC_EXCL into this new field:
data_src.atomic = PERF_MEM_ATOMIC_EXCL;
[1] https://www.ibm.com/docs/en/aix/7.2?topic=set-lwarx-load-word-reserve-indexed-instruction
> # ARM_SPE_OP_SVE_SG
>
> (I'm sorry if this is too far out of scope of the original patch. Let
> me know if you would prefer to discuss it on a separate channel)
>
> On a separate note, I'm also looking at incorporating some of the SVE
> bits in the perf samples.
>
> For this, do you think it makes sense to have two mem_* categories in
> perf_mem_data_src:
>
> mem_vector (2 bits)
> - simd
> - other (SVE in arm64)
I think we can define below vector types:
PERF_MEM_VECTOR_SIMD
PERF_MEM_VECTOR_SVE
The tricky thing is "other"... Based on the description for "Operation
Type packet payload (Other)" in the Armv8 Arm, I think we even need to
add an extra operation type PERF_MEM_OP_OTHER and assign it to
data_src.mem_op field.
> mem_src (1 bit)
> - sparse (scatter/gather loads/stores in SVE, as well as simd)
How about the naming "mem_attr" for new field and define two
attributions:
PERF_MEM_ATTR_SPARSE -> Gather/Scatter operation
PERF_MEM_ATTR_PRED -> Predicated operation
Just remind, we cannot only approve within Arm related developers,
it's good to seek more wider review from other Arch developers when
you send new patch set.
Thanks,
Leo
More information about the linux-arm-kernel
mailing list