oprofile and ARM A9 hardware counter

stephane eranian eranian at googlemail.com
Sun Jan 29 12:36:11 EST 2012


Hi,

Ok, so I did a few more tests and there is a serious issue when sampling
in frequency mode (the default). I noticed wrong number of samples, so
I investigated this some more and instrumented the perf_event kernel code.
I found some erratic timer ticks causing broken period adjustments.

In fact, the problem is visible using top.
I am running a noploop program on CPU0 and nothing else besides top.
The noploop program  does: for(;;);. That is 100% user. On a 2-way
system otherwise idle, I expect top to return 50% user 50% idle.

Top with the commit:

top - 16:19:21 up 5 min,  1 user,  load average: 0.23, 0.15, 0.07
Tasks:  70 total,   2 running,  68 sleeping,   0 stopped,   0 zombie
Cpu(s): 31.1%us,  2.0%sy,  0.0%ni, 66.2%id,  0.0%wa,  0.0%hi,  0.7%si,  0.0%st
            ^^^^^^^^ That's WRONG

Mem:    940292k total,    74984k used,   865308k free,     8020k buffers
Swap:   524240k total,        0k used,   524240k free,    37420k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3770 eranian   20   0   644  160  128 R   99  0.0   0:14.21 noploop
 3771 eranian   20   0  2184 1052  804 R    2  0.1   0:00.32 top
    1 root      20   0  2564 1528  952 S    0  0.2   0:01.26 init


I removed that one liner patch from Ming. The one fiddling with the
clockdomains:

--- a/arch/arm/mach-omap2/clockdomains44xx_data.c
+++ b/arch/arm/mach-omap2/clockdomains44xx_data.c
@@ -390,7 +390,7 @@ static struct clockdomain emu_sys_44xx_clkdm = {
        .prcm_partition   = OMAP4430_PRM_PARTITION,
        .cm_inst          = OMAP4430_PRM_EMU_CM_INST,
        .clkdm_offs       = OMAP4430_PRM_EMU_CM_EMU_CDOFFS,
-       .flags            = CLKDM_CAN_HWSUP,
+       .flags            = CLKDM_CAN_SWSUP,


When I rerun, the test, it now work:

top - 16:02:51 up 15 min,  1 user,  load average: 1.02, 0.46, 0.21
Tasks:  70 total,   2 running,  68 sleeping,   0 stopped,   0 zombie
Cpu(s): 47.2%us,  1.0%sy,  0.0%ni, 50.8%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
           ^^^^^^^^ close enough (in it stabilize somehow around 49%
which is good)

Mem:    940292k total,    75288k used,   865004k free,     8004k buffers
Swap:   524240k total,        0k used,   524240k free,    37408k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3771 eranian   20   0   644  160  128 R  100  0.0   0:34.44 noploop

Although the patch fixes PMU interrupts, it breaks the timer tick logic somehow.
The perf problem is related to timer tick.

I am hoping that the tradeoff is not:
     PMU interrupts but broken timer ticks
vs.
    No PMU interrupts but working timer ticks


On Fri, Jan 27, 2012 at 6:16 PM, stephane eranian
<eranian at googlemail.com> wrote:
> On Fri, Jan 27, 2012 at 6:10 PM, Will Deacon <will.deacon at arm.com> wrote:
>> On Fri, Jan 27, 2012 at 05:03:28PM +0000, stephane eranian wrote:
>>> On Fri, Jan 27, 2012 at 5:59 PM, Will Deacon <will.deacon at arm.com> wrote:
>>> > That said, if you see any bugs in the code please do shout!
>>> >
>>> I suspect there is something wrong, we shouldn't hit the max_rate_limit.
>>> You may have bursts of interrupts (samples). I'll check on that this week-end.
>>
>> Ok, thanks. Keep in mind that you probably have variable rate clocks, which
>> will affect the cycle counter frequency.
>>
> I assume it does not vary the clock if the workload is steady and just burning
> cycles, e.g.: for(;;);
>
>>> >> > A7 and A15 have the ability to filter counters based on privilege level, so
>>> >> > you can get more accurate userspace counts there.
>>> >>
>>> >> Ok, that's better. Need to update libpfm4 for A15 with priv levels then!
>>> >
>>> > How do you handle that in libpfm4? On ARM, the event encodings remain the same,
>>> > you just need to set some extra bits to determine which levels are included or
>>> > excluded (you can do this with the perf tool by using the :{u,k,h} suffix on an
>>> > event description).
>>> >
>>> It depends what you call the encoding? If the priv level can be encoded in the
>>> attr->config field, then that's easy. If it needs to be set somewhere else, then
>>> we need to figure out how you encode it in the attr struct. Either in some other
>>> bits in attr->config or use attr->config1, for instance. You tell me.
>>
>> The way it's done with perf is to set the exclude{user,kernel,hv} fields in
>> the attr. The ARM perf backend then translates these into the relevant bits
>> which get orred into the config_base before hitting the hardware.
>>
> Well, that's also how we do it with libpfm4 on X86. This is because
> with perf_events,
> the exclude_* fields have priority over what you set in the attr->config field.
>
>> Will



More information about the linux-arm-kernel mailing list