Perf Event support for ARMv7 (was: Re: [PATCH 5/5] arm/perfevents: implement perf event support for ARMv6)

Jean Pihet jpihet at mvista.com
Mon Dec 21 06:29:49 EST 2009


On Saturday 19 December 2009 11:29:05 Jamie Iles wrote:
> On Fri, Dec 18, 2009 at 06:05:29PM +0100, Jean Pihet wrote:
> > Here is a patch that adds the support for ARMv7 processors, using the
> > PMNC HW unit.
> >
> > The code is for review, it has been compiled and boot tested only, the
> > complete testing is in progress. Please let me know if the patch is
> > wrapped or garbled I will send it attached (20KB in size).
>
> Excellent! It looks good to me, a few minor comments though. I don't know
> if it's my mail client but some of the longer lines appeared to wrap onto 2
> patch lines but it's not difficult to apply.
>
> [snip]
>
> > I had a question about the events mapping to user space. Although most
> > of the events are mapped in the kernel code, some of the exotic events
> > are not mapped (e.g. NEON or PMU related events). How to use those
> > events from user space? Is it done using the raw mappings?
>
> Yes, the raw events should do the trick. 'perf stat -a -e rff -- sleep 1'
> will do cycle counting on v6 using the raw event number.
Ok.

>
> > +enum armv7_perf_types {
> > +	ARMV7_PERFCTR_PMNC_SW_INCR		= 0x00,
> > +	ARMV7_PERFCTR_IFETCH_MISS		= 0x01,
> > +	ARMV7_PERFCTR_ITLB_MISS			= 0x02,
> > +	ARMV7_PERFCTR_DCACHE_REFILL		= 0x03,
> > +	ARMV7_PERFCTR_DCACHE_ACCESS		= 0x04,
> > +	ARMV7_PERFCTR_DTLB_REFILL		= 0x05,
> > +	ARMV7_PERFCTR_DREAD			= 0x06,
> > +	ARMV7_PERFCTR_DWRITE			= 0x07,
> > +	ARMV7_PERFCTR_INSTR_EXECUTED		= 0x08,
> > +	ARMV7_PERFCTR_EXC_TAKEN			= 0x09,
> > +	ARMV7_PERFCTR_EXC_EXECUTED		= 0x0A,
> > +	ARMV7_PERFCTR_CID_WRITE			= 0x0B,
> > +	ARMV7_PERFCTR_PC_WRITE			= 0x0C,
> > +	ARMV7_PERFCTR_PC_IMM_BRANCH		= 0x0D,
> > +	ARMV7_PERFCTR_PC_PROC_RETURN		= 0x0E,
> > +	ARMV7_PERFCTR_UNALIGNED_ACCESS		= 0x0F,
> > +	ARMV7_PERFCTR_PC_BRANCH_MIS_PRED	= 0x10,
> > +
> > +	ARMV7_PERFCTR_PC_BRANCH_MIS_USED	= 0x12,
> > +
> > +	ARMV7_PERFCTR_WRITE_BUFFER_FULL		= 0x40,
> > +	ARMV7_PERFCTR_L2_STORE_MERGED		= 0x41,
> > +	ARMV7_PERFCTR_L2_STORE_BUFF		= 0x42,
> > +	ARMV7_PERFCTR_L2_ACCESS			= 0x43,
> > +	ARMV7_PERFCTR_L2_CACH_MISS		= 0x44,
> > +	ARMV7_PERFCTR_AXI_READ_CYCLES		= 0x45,
> > +	ARMV7_PERFCTR_AXI_WRITE_CYCLES		= 0x46,
> > +	ARMV7_PERFCTR_MEMORY_REPLAY		= 0x47,
> > +	ARMV7_PERFCTR_UNALIGNED_ACCESS_REPLAY	= 0x48,
> > +	ARMV7_PERFCTR_L1_DATA_MISS		= 0x49,
> > +	ARMV7_PERFCTR_L1_INST_MISS		= 0x4A,
> > +	ARMV7_PERFCTR_L1_DATA_COLORING		= 0x4B,
> > +	ARMV7_PERFCTR_L1_NEON_DATA		= 0x4C,
> > +	ARMV7_PERFCTR_L1_NEON_CACH_DATA		= 0x4D,
> > +	ARMV7_PERFCTR_L2_NEON			= 0x4E,
> > +	ARMV7_PERFCTR_L2_NEON_HIT		= 0x4F,
> > +	ARMV7_PERFCTR_L1_INST			= 0x50,
> > +	ARMV7_PERFCTR_PC_RETURN_MIS_PRED	= 0x51,
> > +	ARMV7_PERFCTR_PC_BRANCH_FAILED		= 0x52,
> > +	ARMV7_PERFCTR_PC_BRANCH_TAKEN		= 0x53,
> > +	ARMV7_PERFCTR_PC_BRANCH_EXECUTED	= 0x54,
> > +	ARMV7_PERFCTR_OP_EXECUTED		= 0x55,
> > +	ARMV7_PERFCTR_CYCLES_INST_STALL		= 0x56,
> > +	ARMV7_PERFCTR_CYCLES_INST		= 0x57,
> > +	ARMV7_PERFCTR_CYCLES_NEON_DATA_STALL	= 0x58,
> > +	ARMV7_PERFCTR_CYCLES_NEON_INST_STALL	= 0x59,
> > +	ARMV7_PERFCTR_NEON_CYCLES		= 0x5A,
> > +
> > +	ARMV7_PERFCTR_PMU0_EVENTS		= 0x70,
> > +	ARMV7_PERFCTR_PMU1_EVENTS		= 0x71,
> > +	ARMV7_PERFCTR_PMU_EVENTS		= 0x72,
> > +
> > +	ARMV7_PERFCTR_CPU_CYCLES		= 0xFF
> > +};
> > +
> > +enum armv7_counters {
> > +	ARMV7_CYCLE_COUNTER = 1,
> > +	ARMV7_COUNTER0,
> > +	ARMV7_COUNTER1,
> > +	ARMV7_COUNTER2,
> > +	ARMV7_COUNTER3,
> > +};
> > +
> > +/*
> > + * The hardware events that we support. We do support cache operations
> > but
> > + * we have harvard caches and no way to combine instruction and data
> > + * accesses/misses in hardware.
> > + */
> > +static const unsigned armv7_perf_map[PERF_COUNT_HW_MAX] = {
> > +	[PERF_COUNT_HW_CPU_CYCLES]	    = ARMV7_PERFCTR_CPU_CYCLES,
> > +	[PERF_COUNT_HW_INSTRUCTIONS]	    = ARMV7_PERFCTR_INSTR_EXECUTED,
> > +	[PERF_COUNT_HW_CACHE_REFERENCES]    = HW_OP_UNSUPPORTED,
> > +	[PERF_COUNT_HW_CACHE_MISSES]	    = HW_OP_UNSUPPORTED,
> > +	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = ARMV7_PERFCTR_PC_BRANCH_TAKEN,
> > +	[PERF_COUNT_HW_BRANCH_MISSES]	    = ARMV7_PERFCTR_PC_BRANCH_FAILED,
> > +	[PERF_COUNT_HW_BUS_CYCLES]	    = HW_OP_UNSUPPORTED,
> > +};
> > +
> > +static const unsigned armv7_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
> > +					  [PERF_COUNT_HW_CACHE_OP_MAX]
> > +					  [PERF_COUNT_HW_CACHE_RESULT_MAX] = {
> > +	[C(L1D)] = {
> > +		/*
> > +		 * The performance counters don't differentiate between read
> > +		 * and write accesses/misses so this isn't strictly correct,
> > +		 * but it's the best we can do. Writes and reads get
> > +		 * combined.
> > +		 */
> > +		[C(OP_READ)] = {
> > +			[C(RESULT_ACCESS)]	= ARMV7_PERFCTR_DCACHE_ACCESS,
> > +			[C(RESULT_MISS)]	= ARMV7_PERFCTR_L1_DATA_MISS,
> > +		},
> > +		[C(OP_WRITE)] = {
> > +			[C(RESULT_ACCESS)]	= ARMV7_PERFCTR_DCACHE_ACCESS,
> > +			[C(RESULT_MISS)]	= ARMV7_PERFCTR_L1_DATA_MISS,
> > +		},
> > +		[C(OP_PREFETCH)] = {
> > +			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
> > +			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
> > +		},
> > +	},
> > +	[C(L1I)] = {
> > +		[C(OP_READ)] = {
> > +			[C(RESULT_ACCESS)]	= ARMV7_PERFCTR_L1_INST,
> > +			[C(RESULT_MISS)]	= ARMV7_PERFCTR_L1_INST_MISS,
> > +		},
> > +		[C(OP_WRITE)] = {
> > +			[C(RESULT_ACCESS)]	= ARMV7_PERFCTR_L1_INST,
> > +			[C(RESULT_MISS)]	= ARMV7_PERFCTR_L1_INST_MISS,
> > +		},
> > +		[C(OP_PREFETCH)] = {
> > +			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
> > +			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
> > +		},
> > +	},
> > +	[C(LL)] = {
> > +		[C(OP_READ)] = {
> > +			[C(RESULT_ACCESS)]	= ARMV7_PERFCTR_L2_ACCESS,
> > +			[C(RESULT_MISS)]	= ARMV7_PERFCTR_L2_CACH_MISS,
> > +		},
> > +		[C(OP_WRITE)] = {
> > +			[C(RESULT_ACCESS)]	= ARMV7_PERFCTR_L2_ACCESS,
> > +			[C(RESULT_MISS)]	= ARMV7_PERFCTR_L2_CACH_MISS,
> > +		},
> > +		[C(OP_PREFETCH)] = {
> > +			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
> > +			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
> > +		},
> > +	},
> > +	[C(DTLB)] = {
> > +		/*
> > +		 * The ARM performance counters can count micro DTLB misses,
> > +		 * micro ITLB misses and main TLB misses. There isn't an event
> > +		 * for TLB misses, so use the micro misses here and if users
> > +		 * want the main TLB misses they can use a raw counter.
> > +		 */
>
> I think this comment needs to be changed for v7. From the events enum it
> doesn't look like v7 has micro tlb events.
Yes I need to correct this.

> > +static inline int armv7_pmnc_select_counter(unsigned int cnt)
> > +{
> > +	u32 val;
> > +
> > +	cnt -= ARMV7_COUNTER_TO_CCNT;
> > +
> > +	if ((cnt == ARMV7_CCNT) || (cnt >= ARMV7_CNTMAX)) {
> > +		printk(KERN_ERR "oprofile: CPU%u selecting wrong PMNC counter"
> > +			" %d\n", smp_processor_id(), cnt);
>
> Most of the printk's refer to oprofile. Could we use pr_err() etc so we get
> the same prefix for all messages?
Oops this is a left over from Oprofile. I will correct the message and use the 
pr_ macros instead of printk.

> [snip]
>
> > +		if (armv7_pmnc_select_counter(counter) == counter)
> > +			asm volatile("mrc p15, 0, %0, c9, c13, 2"
> > +				     : "=r" (value));
>
> Does this sequence need some locking to make sure that we really do read
> from the counter that we've selected? The same applies to the other places.
In fact armv7_pmnc_select_counter is used by armv7pmu_read_counter, 
armv7pmu_write_counter and armv7pmu_enable_event that are called by the perf 
events generic code. Is that enough of a guarantee for atomic accesses, or we 
need some extra locking?

I will post a new version as soon as the changes are made and after some 
testing on a board.

> Jamie

Thanks,
Jean



More information about the linux-arm-kernel mailing list