oprofile and ARM A9 hardware counter

stephane eranian eranian at googlemail.com
Mon Jan 30 15:45:44 EST 2012


On Mon, Jan 30, 2012 at 8:14 PM, Will Deacon <will.deacon at arm.com> wrote:
> On Mon, Jan 30, 2012 at 05:45:19PM +0000, stephane eranian wrote:
>> There you go, no attachment, not sure the omap list
>> supports this.
>
> Cheers Stephane.
>
>> There is something quite interesting to observe.
>>
>> While I run perf record -e cycles -F 100 noploop 10, I watch
>> /proc/interrupts. The number of interrupts is way lower than
>> expected. Therefore the number of samples is way too low:
>>
>> $ perf record -e cycles -F 100 noploop 10
>> $ perf report -D | tail -20
>> cycles stats:
>>            TOTAL events:        535
>>             MMAP events:         11
>>             COMM events:          2
>>             EXIT events:          2
>>           SAMPLE events:        520
>>
>> The delta in /proc/interrupts on CPU1 is 520 interrupts.
>
> Yes, that is about half of what you'd expect. Running on my A9 platform
> (vexpress) I get:
>
> $ perf record -e cycles -F 100 noploop 10
> $ perf report -D | tail -20
> cycles stats:
>           TOTAL events:       1007
>            MMAP events:         18
>            COMM events:          2
>            EXIT events:          2
>          SAMPLE events:        985
>
>> So looks like the frequency adjustment which is hooked off of the
>> timer tick is either not called at each timer tick, the timer ticks are
>> not at regular interval, or the math is wrong.
>
> My hunch is that that the interval is probably varying, but I don't know much
> about OMAP4 and its clocks.
>
Glad you tested this. At least, it seems the generic perf_event code
is allright.
I agree with you, something is fishy with the clocks. Just out of
curiosity, what is
the HZ value for your board? On my Panda it's 128Hz.

>> If I go with the fixed period mode:
>> $ perf stat -e cycles noploop 10
>> noploop for 10 seconds
>>  Performance counter stats for 'noploop 10':
>>        10079156960 cycles                    #    0.000 GHz
>>       10.004547117 seconds time elapsed
>>
>> That means, if I want 100 samples/sec: = 10079156960/(10*100)=10079157
>> $ perf record -e cycles -c 10079157 noploop 10
>> $ perf report -D | tail -20
>> cycles stats:
>>            TOTAL events:       1003
>>             MMAP events:         11
>>             COMM events:          2
>>             EXIT events:          2
>>         THROTTLE events:          1
>>       UNTHROTTLE events:          1
>>           SAMPLE events:        986
>>
>> Now, we're getting the right answer!
>
> Just to confirm, for me:
>
> $ perf stat -e cycles ./noploop 10
> noploop for 10 seconds
>
>  Performance counter stats for './noploop 10':
>
>        4001163930 cycles                    #    0.000 GHz
>
>      10.006534024 seconds time elapsed
>
> $ perf record -e cycles -c 4001163 ./noploop 10
> $ perf report -D | tail -20
>  Aggregated stats:
>           TOTAL events:       1020
>            MMAP events:         18
>            COMM events:          2
>            EXIT events:          2
>          SAMPLE events:        998
> cycles stats:
>           TOTAL events:       1020
>            MMAP events:         18
>            COMM events:          2
>            EXIT events:          2
>          SAMPLE events:        998
>
> which is close enough :)
>
>> We need to elucidate what's going on in perf_event_task_tick().
>> I have tried with my throttling fix and it did not help. We are
>> not subject to throttling with such a low rate.
>
> Ok. I would start by looking at the clock ticks if I were you, since this
> seems to be alright on my board.
>
> Will



More information about the linux-arm-kernel mailing list