kvm vs host (arm64)

Marc Zyngier marc.zyngier at arm.com
Mon Apr 20 04:02:14 PDT 2015


Don't top post. This is very annoying.

On 20/04/15 11:39, Mohan G wrote:
> Thanks for looking into this Marc. 
> Its the xgene  storm based SOC. for profiling , we used the ftrace
> tool. The support for ftrace is present from 3.16 onwards. Its the
> main line kernel that we have installed. The main purpose of running
> this BM is for I/O.
> We initially saw these numbers with DD. The DD numbers too reflect the same. 
> 
> We even tried netperf, just to remove i/o path from perf results.
> Here too the results are same. Have pasted the perf stat below too



> guest stat 
> ========== 
> 
> directlocalhost:~]# perf stat dd if=/dev/zero of=/dev/sdc bs=8192 count=1 oflag= 
> 1+0 records in 
> 1+0 records out 
> 8192 bytes (8.2 kB) copied, 0.0132908 s, 616 kB/s 
> 
> Performance counter stats for 'dd if=/dev/zero of=/dev/sdc bs=8192 count=1 oflag=direct': 
> 
> 110.474128 task-clock (msec) # 0.848 CPUs utilized 
> 1 context-switches # 0.009 K/sec 
> 0 cpu-migrations # 0.000 K/sec 
> 174 page-faults # 0.002 M/sec 
> <not supported> cycles 
> <not supported> stalled-cycles-frontend 
> <not supported> stalled-cycles-backend 
> <not supported> instructions 
> <not supported> branches 
> <not supported> branch-misses 
> 
> 0.130255744 seconds time elapsed 

Do you realize that:
- You're using what looks like a userspace emulated device. Du you
expect any form for performance with that kind of setup?
- Your "benchmark" is absolutely meaningless (who wants to transfer 8k
to measure bandwidth?)

For the record:

root at muffin-man:~# dd if=/dev/zero of=/dev/vda5 bs=8192 count=1 oflag=direct
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.00110308 s, 7.4 MB/s

And yet I persist, this is an absolute meaningless test.

Thanks,

	M.
> 
> 
> 
> host 
> ===== 
> root at mustang1:/home/gmohan# perf stat dd if=/dev/zero of=/dev/sda6 bs=8192 count=1 oflag=direct 
> 1+0 records in 
> 1+0 records out 
> 8192 bytes (8.2 kB) copied, 0.00087308 s, 9.4 MB/s 
> 
> Performance counter stats for 'dd if=/dev/zero of=/dev/sda6 bs=8192 count=1 oflag=direct': 
> 
> 1.024280 task-clock (msec) # 0.525 CPUs utilized 
> 9 context-switches # 0.009 M/sec 
> 0 cpu-migrations # 0.000 K/sec 
> 198 page-faults # 0.193 M/sec 
> 24,17,939 cycles # 2.361 GHz 
> <not supported> stalled-cycles-frontend 
> <not supported> stalled-cycles-backend 
> 8,30,511 instructions # 0.34 insns per cycle 
> <not supported> branches 
> 17,198 branch-misses # 0.00% of all branches 
> 
> 0.001949620 seconds time elapsed 
> 
> 
> 
> Regards
> Mohan
> 
> 
> ----- Original Message -----
> From: Marc Zyngier <marc.zyngier at arm.com>
> To: Mohan G <mohan_gg at yahoo.com>; "linux-arm-kernel at lists.infradead.org" <linux-arm-kernel at lists.infradead.org>
> Cc: 
> Sent: Monday, April 20, 2015 2:39 PM
> Subject: Re: kvm vs host (arm64)
> 
> On 20/04/15 06:45, Mohan G wrote:
>> Hi, 
>> I have got hold of few mustang boards (cortex-a57). Ran a few bench
> 
> Mustang is *not* based on Cortex-A57. So which hardware do you have?
> 
>> marks to measure perf numbers b/w host and guest (kvm). The numbers 
>> are pretty bad. (drop of about 90% to that of host). I even tried
>> running this simple program .
>>
>> main(){ 
>> int i=0; 
>>
>> for(i=0;i<10;i++); 
>> } 
>> Profiling the above shows that same kernel functions in guest takes
>> almost 10x to that of host. sample below
>>
>>
>> Host 
>> ==== 
>> 7202              one-3920  [003] 20015.611563: funcgraph_entry:                   |              find_vma() { 
>> 7203              one-3920  [003] 20015.611564: funcgraph_entry:        0.180 us   |                vmacache_find(); 
>> 7204              one-3920  [003] 20015.611565: funcgraph_entry:        0.120 us   |                vmacache_update(); 
>> 7205              one-3920  [003] 20015.611566: funcgraph_exit:         2.320 us   |              } 
>>
>>
>> Guest 
>> ===== 
>>
>> one-751   [000]   206.843300: funcgraph_entry:                   |              find_vma() { 
>> one-751   [000]   206.843312: funcgraph_entry:        4.880 us   |                vmacache_find(); 
>> one-751   [000]   206.843335: funcgraph_entry:        2.656 us   |                vmacache_update(); 
>> one-751   [000]   206.843354: funcgraph_exit:       + 46.256 us  |              } 
> 
> 
> I wonder how you manage to profile this, as we don't have any perf
> support in KVM yet (you cannot profile a guest). Can you describe your
> profiling method? Also, can you use a non-trivial test (i.e. something
> that is not pure overhead)?
> 
> If that's all your test does, you end up measuring the cost of a stage-2
> page fault, which only happens at startup.
> 
>> kernel: 3.18.9 
> 
> Is that mainline 3.18.9? Or some special tree? I'm also interested in
> seeing results from a 4.0 kernel.
> 
> Thanks,
> 
> 
>     M.
> 


-- 
Jazz is not dead. It just smells funny...



More information about the linux-arm-kernel mailing list