[PATCH 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE
Marc Zyngier
marc.zyngier at arm.com
Mon Oct 7 12:55:30 EDT 2013
On 07/10/13 17:30, Alexander Graf wrote:
>
> On 07.10.2013, at 18:16, Marc Zyngier <marc.zyngier at arm.com> wrote:
>
>> On 07/10/13 17:04, Alexander Graf wrote:
>>>
>>> On 07.10.2013, at 17:40, Marc Zyngier <marc.zyngier at arm.com>
>>> wrote:
>>>
>>>> On an (even slightly) oversubscribed system, spinlocks are
>>>> quickly becoming a bottleneck, as some vcpus are spinning,
>>>> waiting for a lock to be released, while the vcpu holding the
>>>> lock may not be running at all.
>>>>
>>>> This creates contention, and the observed slowdown is 40x for
>>>> hackbench. No, this isn't a typo.
>>>>
>>>> The solution is to trap blocking WFEs and tell KVM that we're
>>>> now spinning. This ensures that other vpus will get a
>>>> scheduling boost, allowing the lock to be released more
>>>> quickly.
>>>>
>>>>> From a performance point of view: hackbench 1 process 1000
>>>>
>>>> 2xA15 host (baseline): 1.843s
>>>>
>>>> 2xA15 guest w/o patch: 2.083s 4xA15 guest w/o patch: 80.212s
>>>>
>>>> 2xA15 guest w/ patch: 2.072s 4xA15 guest w/ patch: 3.202s
>>>
>>> I'm confused. You got from 2.083s when not exiting on spin locks
>>> to 2.072 when exiting on _every_ spin lock that didn't
>>> immediately succeed. I would've expected to second number to be
>>> worse rather than better. I assume it's within jitter, I'm still
>>> puzzled why you don't see any significant drop in performance.
>>
>> The key is in the ARM ARM:
>>
>> B1.14.9: "When HCR.TWE is set to 1, and the processor is in a
>> Non-secure mode other than Hyp mode, execution of a WFE instruction
>> generates a Hyp Trap exception if, ignoring the value of the
>> HCR.TWE bit, conditions permit the processor to suspend
>> execution."
>>
>> So, on a non-overcommitted system, you rarely hit a blocking
>> spinlock, hence not trapping. Otherwise, performance would go down
>> the drain very quickly.
>
> Well, it's the same as pause/loop exiting on x86, but there we have
> special hardware features to only ever exit after n number of
> turnarounds. I wonder why we have those when we could just as easily
> exit on every blocking path.
My understanding of x86 is extremely patchy (and of the non-existent
flavour), so I can't really comment on that.
On ARM, WFE normally blocks if no event is pending for this CPU. We use
it on the spinlock slow path, and have a SEV (Send EVent) on release.
Even in the case of a race between entering the slow path and releasing
the spinlock, you may end-up executing a non-blocking WFE. In this case,
no trap will occur.
> I assume you simply don't contend and spin locks yet. Once you have
> more guest cores things would look differently. So once you have a
> system with more cores available, it might make sense to measure it
> again.
Indeed. Though the above should probably stay valid even if we have a
different locking strategy. Entering a blocking WFE always means you're
going to block for some time (and no, you don't know how long).
> Until then, the numbers are impressive.
I thought as much...
M.
--
Jazz is not dead. It just smells funny...
More information about the linux-arm-kernel
mailing list