oprofile and ARM A9 hardware counter

stephane eranian eranian at googlemail.com
Mon Jan 30 12:15:53 EST 2012


On Mon, Jan 30, 2012 at 5:08 PM, Måns Rullgård <mans at mansr.com> wrote:
> stephane eranian <eranian at googlemail.com> writes:
>
>> Same result for me on CPU1:
>>
>> top - 16:20:24 up  1:45,  1 user,  load average: 0.29, 0.08, 0.07
>> Tasks:  70 total,   2 running,  68 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 30.7%us,  2.7%sy,  0.0%ni, 66.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
>> Mem:    940232k total,   228984k used,   711248k free,    82244k buffers
>> Swap:   524240k total,        0k used,   524240k free,    91400k cached
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
>>  3968 eranian   20   0   644  160  128 R  100  0.0   0:21.98 1 noploop
>>  3969 eranian   20   0  2184 1056  804 R    3  0.1   0:00.53 0 top
>>    82 root      20   0     0    0    0 S    1  0.0   0:01.35 0
>> kworker/0:1
>>
>> With 3.3.0-rc1, if I revert the clockdomain patch, I get the same result.
>> So it must be coming from somewhere else, as you suggested.
>>
>> If the processor was spending time processing interrupts, then this would be
>> accounted for in as sys time. But that's not what I observe here. It's either
>> idle or user. That line, leads me to believe that the processor can only run
>> my program for 30% of the time. The rest is spent idling even though my
>> program is non-blocking. How could that be possible? Power-saving?
>
> In top, press 1 to see the statistics for the CPUs separately.
>
Ok, when I pin my program to CPU1, and press 1 in top I get:
asks:  69 total,   2 running,  67 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.9%us,  3.8%sy,  0.0%ni, 94.3%id,  0.0%wa,  0.0%hi,  0.9%si,  0.0%st
Cpu1  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    940232k total,    75480k used,   864752k free,     8148k buffers
Swap:   524240k total,        0k used,   524240k free,    37568k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3788 eranian   20   0   644  160  128 R  100  0.0   0:47.93 noploop
 3758 eranian   20   0  9900 1512  712 S    2  0.2   0:01.17 sshd
 3789 eranian   20   0  2184 1056  804 R    2  0.1   0:01.22 top

Which gives me the right answer. But in 'collapsed mode', press 1 again,
the aggregate value is bogus. Could be wrong math in top. Ok, that was
a false alarm then. Thanks for the help.

Still need to investigate why the frequency mode does
not yield the correct number of samples even with low frequency.


$ taskset -c 1 perf record -e cycles -F 100 noploop 10
$ perf report -D | tail -20
Aggregated stats:
           TOTAL events:        475
            MMAP events:         11
            COMM events:          2
            EXIT events:          2
          SAMPLE events:        460
cycles stats:
           TOTAL events:        475
            MMAP events:         11
            COMM events:          2
            EXIT events:          2
          SAMPLE events:        460

460 samples is way too low. Should be 100x10 = 1000 samples or close to it.



More information about the linux-arm-kernel mailing list