Perf Event support for ARMv7 (was: Re: [PATCH 5/5] arm/perfevents: implement perf event support for ARMv6)

Jean Pihet jpihet at mvista.com
Mon Dec 21 06:43:07 EST 2009


Hi,

On Monday 21 December 2009 12:04:55 Will Deacon wrote:
> Hi Jean,
>
> I've provided some comments inline. Hopefully they're useful.
Thanks for reviewing the code.

> * Jean Pihet wrote:
> > Hello,
> >
> > Here is a patch that adds the support for ARMv7 processors, using the
> > PMNC HW unit.
> >
> > The code is for review, it has been compiled and boot tested only, the
> > complete testing is in progress. Please let me know if the patch is
> > wrapped or garbled I will send it attached (20KB in size).
> >
> > Feedback is welcome.
>
> <snip>
>
> > diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> > index abb5267..79e92ce 100644
> > --- a/arch/arm/kernel/perf_event.c
> > +++ b/arch/arm/kernel/perf_event.c
> > @@ -4,6 +4,7 @@
> >   * ARM performance counter support.
> >   *
> >   * Copyright (C) 2009 picoChip Designs, Ltd., Jamie Iles
> > + * ARMv7 support: Jean Pihet <jpihet at mvista.com>
> >   *
> >   * This code is based on the sparc64 perf event code, which is in turn
> > based
> >   * on the x86 code. Callchain code is based on the ARM OProfile
> > backtrace
> > @@ -35,8 +36,11 @@ DEFINE_SPINLOCK(pmu_lock);
> >   * ARMv6 supports a maximum of 3 events, starting from index 1. If we
> > add
> >   * another platform that supports more, we need to increase this to be
> > the
> >   * largest of all platforms.
> > + *
> > + * ARMv7 supports up to 5 events:
> > + *  cycle counter CCNT + 4 events counters CNT0..3
> >   */
> > -#define ARMPMU_MAX_HWEVENTS		4
> > +#define ARMPMU_MAX_HWEVENTS		5
>
> The maximum number of event counters on ARMv7 is currently 6 [Cortex-A9],
> plus a cycle counter. Additionally, the number of event counters actually
> available is implementation defined (the cycle counter is mandatory). You
> can find out the number of event counters using the PMCR ((PMCR >> 11) &
> 0x1f).
I think we should support Cortex-A8 for now and add support for Cortex-A9 on 
top of it. IIUC a generic ARMV7 support is not possible so I will need 
separate handling for Cortex-A8 and -A9. Is that correct?

Unfortunately I do not have any -A9 HW for now. I will look at the spec in 
order to spot the differences between both PMNC units.

> >  /* The events for a given CPU. */
> >  struct cpu_hw_events {
> > @@ -965,6 +969,701 @@ static struct arm_pmu armv6pmu = {
> >  	.max_period		= (1LLU << 32) - 1,
> >  };
> >
> > +/*
> > + * ARMv7 Performance counter handling code.
> > + *
> > + * Copied from ARMv6 code, with the low level code inspired
> > + *  by the ARMv7 Oprofile code.
> > + *
> > + * ARMv7 has 4 configurable performance counters and a single cycle
> > counter.
> > + * All counters can be enabled/disabled and IRQ masked separately. The
> > cycle
> > + *  counter and all 4 performance counters together can be reset
> > separately.
> > + */
> > +
> > +enum armv7_perf_types {
> > +	ARMV7_PERFCTR_PMNC_SW_INCR		= 0x00,
> > +	ARMV7_PERFCTR_IFETCH_MISS		= 0x01,
> > +	ARMV7_PERFCTR_ITLB_MISS			= 0x02,
> > +	ARMV7_PERFCTR_DCACHE_REFILL		= 0x03,
> > +	ARMV7_PERFCTR_DCACHE_ACCESS		= 0x04,
> > +	ARMV7_PERFCTR_DTLB_REFILL		= 0x05,
> > +	ARMV7_PERFCTR_DREAD			= 0x06,
> > +	ARMV7_PERFCTR_DWRITE			= 0x07,
> > +	ARMV7_PERFCTR_INSTR_EXECUTED		= 0x08,
> > +	ARMV7_PERFCTR_EXC_TAKEN			= 0x09,
> > +	ARMV7_PERFCTR_EXC_EXECUTED		= 0x0A,
> > +	ARMV7_PERFCTR_CID_WRITE			= 0x0B,
> > +	ARMV7_PERFCTR_PC_WRITE			= 0x0C,
> > +	ARMV7_PERFCTR_PC_IMM_BRANCH		= 0x0D,
> > +	ARMV7_PERFCTR_PC_PROC_RETURN		= 0x0E,
> > +	ARMV7_PERFCTR_UNALIGNED_ACCESS		= 0x0F,
> > +	ARMV7_PERFCTR_PC_BRANCH_MIS_PRED	= 0x10,
> > +
> > +	ARMV7_PERFCTR_PC_BRANCH_MIS_USED	= 0x12,
>
> Ok - the events so far are defined by the v7 architecture.
> Note that this doesn't necessarily mean they are all supported by
> the core.
Is there a way to detect the supported PMU events at run-time? Is it harmful 
to use unsupported events?

> > +	ARMV7_PERFCTR_WRITE_BUFFER_FULL		= 0x40,
> > +	ARMV7_PERFCTR_L2_STORE_MERGED		= 0x41,
> > +	ARMV7_PERFCTR_L2_STORE_BUFF		= 0x42,
> > +	ARMV7_PERFCTR_L2_ACCESS			= 0x43,
> > +	ARMV7_PERFCTR_L2_CACH_MISS		= 0x44,
> > +	ARMV7_PERFCTR_AXI_READ_CYCLES		= 0x45,
> > +	ARMV7_PERFCTR_AXI_WRITE_CYCLES		= 0x46,
> > +	ARMV7_PERFCTR_MEMORY_REPLAY		= 0x47,
> > +	ARMV7_PERFCTR_UNALIGNED_ACCESS_REPLAY	= 0x48,
> > +	ARMV7_PERFCTR_L1_DATA_MISS		= 0x49,
> > +	ARMV7_PERFCTR_L1_INST_MISS		= 0x4A,
> > +	ARMV7_PERFCTR_L1_DATA_COLORING		= 0x4B,
> > +	ARMV7_PERFCTR_L1_NEON_DATA		= 0x4C,
> > +	ARMV7_PERFCTR_L1_NEON_CACH_DATA		= 0x4D,
> > +	ARMV7_PERFCTR_L2_NEON			= 0x4E,
> > +	ARMV7_PERFCTR_L2_NEON_HIT		= 0x4F,
> > +	ARMV7_PERFCTR_L1_INST			= 0x50,
> > +	ARMV7_PERFCTR_PC_RETURN_MIS_PRED	= 0x51,
> > +	ARMV7_PERFCTR_PC_BRANCH_FAILED		= 0x52,
> > +	ARMV7_PERFCTR_PC_BRANCH_TAKEN		= 0x53,
> > +	ARMV7_PERFCTR_PC_BRANCH_EXECUTED	= 0x54,
> > +	ARMV7_PERFCTR_OP_EXECUTED		= 0x55,
> > +	ARMV7_PERFCTR_CYCLES_INST_STALL		= 0x56,
> > +	ARMV7_PERFCTR_CYCLES_INST		= 0x57,
> > +	ARMV7_PERFCTR_CYCLES_NEON_DATA_STALL	= 0x58,
> > +	ARMV7_PERFCTR_CYCLES_NEON_INST_STALL	= 0x59,
> > +	ARMV7_PERFCTR_NEON_CYCLES		= 0x5A,
> > +
> > +	ARMV7_PERFCTR_PMU0_EVENTS		= 0x70,
> > +	ARMV7_PERFCTR_PMU1_EVENTS		= 0x71,
> > +	ARMV7_PERFCTR_PMU_EVENTS		= 0x72,
> > +
> > +	ARMV7_PERFCTR_CPU_CYCLES		= 0xFF
> > +};
>
> These events are specific to the Cortex-A8.
> Unfortunately, these numbers clash with events specific
> to the Cortex-A9 [and potentially future v7 cores].
> For example, 0x40 on the A8 is WRITE_BUFFER_FULL but on the
> A9 it is JAVA_BYTECODE_EXEC. This means that you'll need to
> take a similar approach as was taken for ARM11MP vs ARM11*.
Ok so I will need to separate Cortex-A8 from -A9.

> <snip>
>
> > +/*
> > + * Available counters
> > + */
> > +#define ARMV7_CCNT 		0
> > +#define ARMV7_CNT0 		1
> > +#define ARMV7_CNT1 		2
> > +#define ARMV7_CNT2 		3
> > +#define ARMV7_CNT3 		4
> > +#define ARMV7_CNTMAX 		5
> > +#define ARMV7_COUNTER_TO_CCNT	(ARMV7_CYCLE_COUNTER - ARMV7_CCNT)
> > +
> > +#define ARMV7_CPU_COUNTER(cpu, counter)	((cpu) * CNTMAX + (counter))
>
> You don't use this macro. I imagine there are others which are no longer
> used too.
Ok I am checking and cleaning the code.

> <snip>
>
> > +static inline int armv7_pmnc_select_counter(unsigned int cnt)
> > +{
> > +	u32 val;
> > +
> > +	cnt -= ARMV7_COUNTER_TO_CCNT;
> > +
> > +	if ((cnt == ARMV7_CCNT) || (cnt >= ARMV7_CNTMAX)) {
> > +		printk(KERN_ERR "oprofile: CPU%u selecting wrong PMNC counter"
> > +			" %d\n", smp_processor_id(), cnt);
> > +		return -1;
> > +	}
>
> Nice error message :)
Indeed! Theis is corrected already.

> <snip>
>
> >  static int __init
> >  init_hw_perf_events(void)
> >  {
> > @@ -977,6 +1676,13 @@ init_hw_perf_events(void)
> >                  memcpy(armpmu_perf_cache_map, armv6_perf_cache_map,
> >                         sizeof(armv6_perf_cache_map));
> >                  perf_max_events	= armv6pmu.num_events;
> > +	} else if (cpu_architecture() == CPU_ARCH_ARMv7) {
> > +		armpmu = &armv7pmu;
> > +		memcpy(armpmu_perf_cache_map, armv7_perf_cache_map,
> > +			sizeof(armv7_perf_cache_map));
> > +		perf_max_events	= armv7pmu.num_events;
> > +		/* Initialize & Reset PMNC: C bit and P bit */
> > +		armv7_pmnc_write(ARMV7_PMNC_P | ARMV7_PMNC_C);
> >          } else {
> >                  pr_info("no hardware support available\n");
> >                  perf_max_events = -1;
>
> You'll need to switch on the cpuid to select the correct event mappings.
>
> I've implemented this for oprofile, I'll post it as an RFC after Christmas
> as I won't be able to respond in the meantime.
Ok. Do you know how I can differentiate Cortex-A8 from -A9?

I will post a new version with the corrections.
>
> Cheers,
>
> Will

Cheers and a good celebration time,
Jean



More information about the linux-arm-kernel mailing list