[PATCH 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE
Marc Zyngier
marc.zyngier at arm.com
Mon Oct 7 12:16:30 EDT 2013
On 07/10/13 17:04, Alexander Graf wrote:
>
> On 07.10.2013, at 17:40, Marc Zyngier <marc.zyngier at arm.com> wrote:
>
>> On an (even slightly) oversubscribed system, spinlocks are quickly
>> becoming a bottleneck, as some vcpus are spinning, waiting for a
>> lock to be released, while the vcpu holding the lock may not be
>> running at all.
>>
>> This creates contention, and the observed slowdown is 40x for
>> hackbench. No, this isn't a typo.
>>
>> The solution is to trap blocking WFEs and tell KVM that we're now
>> spinning. This ensures that other vpus will get a scheduling boost,
>> allowing the lock to be released more quickly.
>>
>>> From a performance point of view: hackbench 1 process 1000
>>
>> 2xA15 host (baseline): 1.843s
>>
>> 2xA15 guest w/o patch: 2.083s 4xA15 guest w/o patch: 80.212s
>>
>> 2xA15 guest w/ patch: 2.072s 4xA15 guest w/ patch: 3.202s
>
> I'm confused. You got from 2.083s when not exiting on spin locks to
> 2.072 when exiting on _every_ spin lock that didn't immediately
> succeed. I would've expected to second number to be worse rather than
> better. I assume it's within jitter, I'm still puzzled why you don't
> see any significant drop in performance.
The key is in the ARM ARM:
B1.14.9: "When HCR.TWE is set to 1, and the processor is in a Non-secure
mode other than Hyp mode, execution of a WFE instruction generates a Hyp
Trap exception if, ignoring the value of the HCR.TWE bit, conditions
permit the processor to suspend execution."
So, on a non-overcommitted system, you rarely hit a blocking spinlock,
hence not trapping. Otherwise, performance would go down the drain very
quickly.
And yes, the difference is pretty much noise.
M.
--
Jazz is not dead. It just smells funny...
More information about the linux-arm-kernel
mailing list