[PATCH] arm64: KVM: Optimize arm64 guest exit VFP/SIMD register save/restore

Marc Zyngier marc.zyngier at arm.com
Mon Jun 15 11:51:44 PDT 2015


On 15/06/15 19:44, Mario Smarduch wrote:
> On 06/15/2015 11:20 AM, Marc Zyngier wrote:
>> On 15/06/15 19:04, Mario Smarduch wrote:
>>> On 06/15/2015 03:00 AM, Marc Zyngier wrote:
>>>> Hi Mario,
>>>>
>>>> I was working on a more ambitious patch series, 
>>>> but we probably ought to
>>>> start small, and this looks fairly sensible to me.
>>>
>>> Hi Marc,
>>>    thanks for reviewing, I was thinking to post this
>>> first and next iteration on guest access switch
>>> back to host registers only upon  return to user space or
>>> vCPU context switch. This should save more cycles for
>>> various exits.
>>>
>>> Were you thinking along the same lines or something
>>> altogether different?
>>
>> That's mostly what I had in mind. Basically staying away from touching
>> the FP registers until vcpu_put(). I had it mostly working, but
>> experienced some interesting corruption cases, specially when using
>> 32bit guests.
>>
>>>
>>>>
>>>> A few minor comments below.
>>>>
>>>> On 13/06/15 23:20, Mario Smarduch wrote:
>>>>> Currently VFP/SIMD registers are always saved and restored
>>>>> on Guest entry and exit.
>>>>>
>>>>> This patch only saves and restores VFP/SIMD registers on
>>>>> Guest access. To do this cptr_el2 VFP/SIMD trap is set
>>>>> on Guest entry and later checked on exit. This follows
>>>>> the ARMv7 VFPv3 implementation. Running an informal test
>>>>> there are high number of exits that don't access VFP/SIMD
>>>>> registers.
>>>>
>>>> It would be good to add some numbers here. How often do we exit without
>>>> having touched the FPSIMD regs? For which workload?
>>>
>>> Lmbench is what I typically use, with ssh server, i.e., cause page
>>> faults and interrupts - usually registers are not touched.
>>> I'll run the tests again and define usually.
>>>
>>> Any other loads you had in mind?
>>
>> Not really (apart from running hackbench, of course...;-). I'd just like
>> to see the numbers in the commit message, so that we can document the
>> improvement (and maybe track regressions).
> 
> Ok I understand.
> 
>>
>> [...]
>>
>>>>
>>>>>  	skip_debug_state x3, 1f
>>>>>  	// Clear the dirty flag for the next run, as all the state has
>>>>>  	// already been saved. Note that we nuke the whole 64bit word.
>>>>> @@ -1166,6 +1211,10 @@ el1_sync:					// Guest trapped into EL2
>>>>>  	mrs	x1, esr_el2
>>>>>  	lsr	x2, x1, #ESR_ELx_EC_SHIFT
>>>>>
>>>>> +	/* Guest accessed VFP/SIMD registers, save host, restore Guest */
>>>>> +	cmp	x2, #ESR_ELx_EC_FP_ASIMD
>>>>> +	b.eq	switch_to_guest_vfp
>>>>> +
>>>>
>>>> I'd prefer you moved that hunk to el1_trap, where we handle all the
>>>> traps coming from the guest.
>>>
>>> I'm thinking would it make sense to update the armv7 side as
>>> well. When reading both exit handlers the flow mirrors
>>> each other.
>>
>> The 32bit code is starting to show its age, and could probably do with a
>> refactor. If you have some cycles to spare, that'd be quite interesting.
> 
> Yep, will do, ARMv7 is still very relevant.

You bet it is. My home router is a v7 VM...

	M.
-- 
Jazz is not dead. It just smells funny...



More information about the linux-arm-kernel mailing list