[PATCH v2 00/21] arm64: KVM: world switch in C

Tue Dec 1 09:51:46 PST 2015

On 01/12/15 12:00, Christoffer Dall wrote:
> On Tue, Dec 01, 2015 at 09:58:23AM +0000, Marc Zyngier wrote:
>> On 30/11/15 20:33, Christoffer Dall wrote:
>>> On Fri, Nov 27, 2015 at 06:49:54PM +0000, Marc Zyngier wrote:
>>>> Once upon a time, the KVM/arm64 world switch was a nice, clean, lean
>>>> and mean piece of hand-crafted assembly code. Over time, features have
>>>> crept in, the code has become harder to maintain, and the smallest
>>>> change is a pain to introduce. The VHE patches are a prime example of
>>>> why this doesn't work anymore.
>>>>
>>>> This series rewrites most of the existing assembly code in C, but keeps
>>>> the existing code structure in place (most function names will look
>>>> familiar to the reader). The biggest change is that we don't have to
>>>> deal with a static register allocation (the compiler does it for us),
>>>> we can easily follow structure and pointers, and only the lowest level
>>>> is still in assembly code. Oh, and a negative diffstat.
>>>>
>>>> There is still a healthy dose of inline assembly (system register
>>>> accessors, runtime code patching), but I've tried not to make it too
>>>> invasive. The generated code, while not exactly brilliant, doesn't
>>>> look too shaby. I do expect a small performance degradation, but I
>>>> believe this is something we can improve over time (my initial
>>>> measurements don't show any obvious regression though).
>>>
>>> I ran this through my experimental setup on m400 and got this:
>>
>> [...]
>>
>>> What this tells me is that we do take a noticable hit on the
>>> world-switch path, which shows up in the TCP_RR and hackbench workloads,
>>> which have a high precision in their output.
>>>
>>> Note that the memcached number is well within its variability between
>>> individual benchmark runs, where it varies to 12% of its average in over
>>> 80% of the executions.
>>>
>>> I don't think this is a showstopper thought, but we could consider
>>> looking more closely at a breakdown of the world-switch path and verify
>>> if/where we are really taking a hit.
>>
>> Thanks for doing so, very interesting. As a data point, what compiler
>> are you using? I'd expect some variability based on the compiler version...
>>
> I used the following (compiling natively on the m400):
> 
> gcc version 4.8.2 (Ubuntu/Linaro 4.8.2-19ubuntu1)

For what it is worth, I've ran hackbench on my Seattle B0 (8xA57 2GHz),
with a 4 vcpu VM and got the following results (10 runs per kernel
version, same configuration):

v4.4-rc3-wsinc: Average 31.750
32.459
32.124
32.435
31.940
31.085
31.804
31.862
30.985
31.450
31.359

v4.4-rc3: Average 31.954
31.806
31.598
32.697
31.472
31.410
32.562
31.938
31.932
31.672
32.459

This is with GCC as produced by Linaro:
aarch64-linux-gnu-gcc (Linaro GCC 5.1-2015.08) 5.1.1 20150608

It could well be that your compiler generates worse code than the one I
use, or that the code it outputs is badly tuned for XGene. I guess I
need to unearth my Mustang to find out...

	M.
-- 
Jazz is not dead. It just smells funny...