oprofile and ARM A9 hardware counter
stephane eranian
eranian at googlemail.com
Mon Jan 30 12:15:53 EST 2012
On Mon, Jan 30, 2012 at 5:08 PM, Måns Rullgård <mans at mansr.com> wrote:
> stephane eranian <eranian at googlemail.com> writes:
>
>> Same result for me on CPU1:
>>
>> top - 16:20:24 up 1:45, 1 user, load average: 0.29, 0.08, 0.07
>> Tasks: 70 total, 2 running, 68 sleeping, 0 stopped, 0 zombie
>> Cpu(s): 30.7%us, 2.7%sy, 0.0%ni, 66.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
>> Mem: 940232k total, 228984k used, 711248k free, 82244k buffers
>> Swap: 524240k total, 0k used, 524240k free, 91400k cached
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
>> 3968 eranian 20 0 644 160 128 R 100 0.0 0:21.98 1 noploop
>> 3969 eranian 20 0 2184 1056 804 R 3 0.1 0:00.53 0 top
>> 82 root 20 0 0 0 0 S 1 0.0 0:01.35 0
>> kworker/0:1
>>
>> With 3.3.0-rc1, if I revert the clockdomain patch, I get the same result.
>> So it must be coming from somewhere else, as you suggested.
>>
>> If the processor was spending time processing interrupts, then this would be
>> accounted for in as sys time. But that's not what I observe here. It's either
>> idle or user. That line, leads me to believe that the processor can only run
>> my program for 30% of the time. The rest is spent idling even though my
>> program is non-blocking. How could that be possible? Power-saving?
>
> In top, press 1 to see the statistics for the CPUs separately.
>
Ok, when I pin my program to CPU1, and press 1 in top I get:
asks: 69 total, 2 running, 67 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.9%us, 3.8%sy, 0.0%ni, 94.3%id, 0.0%wa, 0.0%hi, 0.9%si, 0.0%st
Cpu1 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 940232k total, 75480k used, 864752k free, 8148k buffers
Swap: 524240k total, 0k used, 524240k free, 37568k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3788 eranian 20 0 644 160 128 R 100 0.0 0:47.93 noploop
3758 eranian 20 0 9900 1512 712 S 2 0.2 0:01.17 sshd
3789 eranian 20 0 2184 1056 804 R 2 0.1 0:01.22 top
Which gives me the right answer. But in 'collapsed mode', press 1 again,
the aggregate value is bogus. Could be wrong math in top. Ok, that was
a false alarm then. Thanks for the help.
Still need to investigate why the frequency mode does
not yield the correct number of samples even with low frequency.
$ taskset -c 1 perf record -e cycles -F 100 noploop 10
$ perf report -D | tail -20
Aggregated stats:
TOTAL events: 475
MMAP events: 11
COMM events: 2
EXIT events: 2
SAMPLE events: 460
cycles stats:
TOTAL events: 475
MMAP events: 11
COMM events: 2
EXIT events: 2
SAMPLE events: 460
460 samples is way too low. Should be 100x10 = 1000 samples or close to it.
More information about the linux-arm-kernel
mailing list