[PATCH 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE

Tue Oct 8 12:09:11 EDT 2013

On 08/10/13 16:13, Raghavendra K T wrote:
> On 10/08/2013 08:36 PM, Marc Zyngier wrote:
>>>> Just gave it a go, and the results are slightly (but consistently)
>>>> worse. Over 10 runs:
>>>>
>>>> Without RELAX_INTERCEPT: Average run 3.3623s
>>>> With RELAX_INTERCEPT: Average run 3.4226s
>>>>
>>>> Not massive, but still noticeable. Any clue?
>>>
>>> Is it  a 4x overcommit? Probably we would have hit the code
>>> overhead if it were small guests.
>>
>> Only 2x overcommit (dual core host, quad vcpu guests).
> 
> Okay. quad vcpu seem to explain.
> 
>>
>>> RELAX_INTERCEPT is worth enabling for large guests with
>>> overcommits.
>>
>> I'll try something more aggressive as soon as I get the time. What do
>> you call a large guest? So far, the hard limit on ARM is 8 vcpus.
>>
> 
> Okay. I was referring to guests >= 32 vcpus.
> May be 8vcpu guests with 2x/4x is worth trying. If we still do not
> see benefit, then it is not worth enabling.

I've just tried with the worse case I can construct, which is a 8 vcpu
guest limited to one physical CPU:

Over 10 runs:

Without RELAX_INTERCEPT:
Time: 6.793
Time: 7.619
Time: 6.690
Time: 7.198
Time: 7.659
Time: 7.054
Time: 7.728
Time: 8.546
Time: 7.306
Time: 7.219

Average: 7.381

With RELAX_INTERCEPT:
Time: 6.850
Time: 6.889
Time: 7.170
Time: 6.938
Time: 6.756
Time: 7.341
Time: 6.707
Time: 7.452
Time: 6.617
Time: 8.095

Average: 7.082

We're now starting to see some (small) benefits: slightly faster with
RELAX_INTERCEPT, and less jitter (the heuristic is better at picking the
target vcpu than the default behaviour).

I'll enable it in the next version of the series.

Thanks!

	M.
-- 
Jazz is not dead. It just smells funny...