[PATCH 2/3] KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems

Marc Zyngier marc.zyngier at arm.com
Fri Jan 13 05:57:23 PST 2017


On 13/01/17 13:46, Christoffer Dall wrote:
> On Fri, Jan 13, 2017 at 01:30:29PM +0000, Marc Zyngier wrote:
>> [+ Suzuki, who wrote the whole cpus_have_const_cap thing]
>>
>> On 13/01/17 12:36, Christoffer Dall wrote:
>>> On Fri, Jan 13, 2017 at 11:31:32AM +0000, Marc Zyngier wrote:
>>>> From: Jintack Lim <jintack at cs.columbia.edu>
>>>>
>>>> Current KVM world switch code is unintentionally setting wrong bits to
>>>> CNTHCTL_EL2 when E2H == 1, which may allow guest OS to access physical
>>>> timer.  Bit positions of CNTHCTL_EL2 are changing depending on
>>>> HCR_EL2.E2H bit.  EL1PCEN and EL1PCTEN are 1st and 0th bits when E2H is
>>>> not set, but they are 11th and 10th bits respectively when E2H is set.
>>>>
>>>> In fact, on VHE we only need to set those bits once, not for every world
>>>> switch. This is because the host kernel runs in EL2 with HCR_EL2.TGE ==
>>>> 1, which makes those bits have no effect for the host kernel execution.
>>>> So we just set those bits once for guests, and that's it.
>>>>
>>>> Signed-off-by: Jintack Lim <jintack at cs.columbia.edu>
>>>> Reviewed-by: Marc Zyngier <marc.zyngier at arm.com>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier at arm.com>
>>>> ---
>>>>  arch/arm/include/asm/virt.h   |  5 +++++
>>>>  arch/arm/kvm/arm.c            |  3 +++
>>>>  arch/arm64/include/asm/virt.h |  9 +++++++++
>>>>  include/kvm/arm_arch_timer.h  |  1 +
>>>>  virt/kvm/arm/arch_timer.c     | 23 +++++++++++++++++++++++
>>>>  virt/kvm/arm/hyp/timer-sr.c   | 33 +++++++++++++++++++++------------
>>>>  6 files changed, 62 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/arch/arm/include/asm/virt.h b/arch/arm/include/asm/virt.h
>>>> index a2e75b8..6dae195 100644
>>>> --- a/arch/arm/include/asm/virt.h
>>>> +++ b/arch/arm/include/asm/virt.h
>>>> @@ -80,6 +80,11 @@ static inline bool is_kernel_in_hyp_mode(void)
>>>>  	return false;
>>>>  }
>>>>  
>>>> +static inline bool has_vhe(void)
>>>> +{
>>>> +	return false;
>>>> +}
>>>> +
>>>>  /* The section containing the hypervisor idmap text */
>>>>  extern char __hyp_idmap_text_start[];
>>>>  extern char __hyp_idmap_text_end[];
>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>> index 1167678..9d74464 100644
>>>> --- a/arch/arm/kvm/arm.c
>>>> +++ b/arch/arm/kvm/arm.c
>>>> @@ -1099,6 +1099,9 @@ static void cpu_init_hyp_mode(void *dummy)
>>>>  	__cpu_init_hyp_mode(pgd_ptr, hyp_stack_ptr, vector_ptr);
>>>>  	__cpu_init_stage2();
>>>>  
>>>> +	if (is_kernel_in_hyp_mode())
>>>> +		kvm_timer_init_vhe();
>>>> +
>>>>  	kvm_arm_init_debug();
>>>>  }
>>>>  
>>>> diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
>>>> index fea1073..439f6b5 100644
>>>> --- a/arch/arm64/include/asm/virt.h
>>>> +++ b/arch/arm64/include/asm/virt.h
>>>> @@ -47,6 +47,7 @@
>>>>  #include <asm/ptrace.h>
>>>>  #include <asm/sections.h>
>>>>  #include <asm/sysreg.h>
>>>> +#include <asm/cpufeature.h>
>>>>  
>>>>  /*
>>>>   * __boot_cpu_mode records what mode CPUs were booted in.
>>>> @@ -80,6 +81,14 @@ static inline bool is_kernel_in_hyp_mode(void)
>>>>  	return read_sysreg(CurrentEL) == CurrentEL_EL2;
>>>>  }
>>>>  
>>>> +static inline bool has_vhe(void)
>>>> +{
>>>> +	if (cpus_have_const_cap(ARM64_HAS_VIRT_HOST_EXTN))
>>>> +		return true;
>>>> +
>>>> +	return false;
>>>> +}
>>>> +
>>>
>>> I was experimenting with using has_vhe for some of the optimization code
>>> I was writing, and I saw a hyp crash as a result.  That made me wonder
>>> if this is really safe in Hyp mode?
>>>
>>> Specifically, there is no guarantee that this will actually be inlined
>>> in the caller, right?  At least that's what I can gather from trying to
>>> understand the semantics of the inline keyword in the GCC manual.
>>
>> Indeed, there is no strict guarantee that this is enforced. We should
>> probably have __always_inline instead. But having checked the generated
>> code for __timer_restore_state, the function is definitely inlined
>> (gcc 6.2). Happy to queue an extra patch changing that.
>>
>>> Further, are we guaranteed that the static branch gets compiled into
>>> something that doesn't actually look at cpu_hwcap_keys, which is not
>>> mapped in hyp mode?
>>
>> Here's the disassembly:
>>
>> ffff000008ad01d0 <__timer_restore_state>:
>> ffff000008ad01d0:       f9400001        ldr     x1, [x0]
>> ffff000008ad01d4:       9240bc21        and     x1, x1, #0xffffffffffff
>> ffff000008ad01d8:       d503201f        nop
>> ffff000008ad01dc:       d503201f        nop
>> ffff000008ad01e0:       d53ce102        mrs     x2, cnthctl_el2
>> ffff000008ad01e4:       927ef842        and     x2, x2, #0xfffffffffffffffd
>> ffff000008ad01e8:       b2400042        orr     x2, x2, #0x1
>> ffff000008ad01ec:       d51ce102        msr     cnthctl_el2, x2
>> ffff000008ad01f0:       d2834002        mov     x2, #0x1a00                     // #6656
>> ffff000008ad01f4:       8b020000        add     x0, x0, x2
>> ffff000008ad01f8:       91038002        add     x2, x0, #0xe0
>> ffff000008ad01fc:       39425443        ldrb    w3, [x2,#149]
>> ffff000008ad0200:       34000103        cbz     w3, ffff000008ad0220 <__timer_restore_state+0x50>
>> ffff000008ad0204:       f945a821        ldr     x1, [x1,#2896]
>> ffff000008ad0208:       d51ce061        msr     cntvoff_el2, x1
>> ffff000008ad020c:       f9400441        ldr     x1, [x2,#8]
>> ffff000008ad0210:       d51be341        msr     cntv_cval_el0, x1
>> ffff000008ad0214:       d5033fdf        isb
>> ffff000008ad0218:       b940e000        ldr     w0, [x0,#224]
>> ffff000008ad021c:       d51be320        msr     cntv_ctl_el0, x0
>> ffff000008ad0220:       d65f03c0        ret
>>
>> The static branch resolves as such when VHE is enabled (taken from
>> a running model):
>>
>> ffff000008ad01d0 <__timer_restore_state>:
>> ffff000008ad01d0:       f9400001        ldr     x1, [x0]
>> ffff000008ad01d4:       9240bc21        nop
>> ffff000008ad01d8:       d503201f        nop
>> ffff000008ad01dc:       d503201f        b	ffff000008ad01f0
>> ffff000008ad01e0:       d53ce102        mrs     x2, cnthctl_el2
>> [...]
>>
>> That's using a toolchain that supports the "asm goto" feature that is used
>> to implement static branches (and that's guaranteed not to generate any
>> memory access other than the code patching itself).
>>
>> Now, with a toolchain that doesn't support this, such as gcc 4.8:
> 
> Hmm, I saw the error with 5.4.1, but perhaps I messed something else up,
> because I cannot seem to reproduce this at the moment.
> 
>>
>> ffff000008aa5168 <__timer_restore_state>:
>> ffff000008aa5168:       f9400001        ldr     x1, [x0]
>> ffff000008aa516c:       9240bc21        and     x1, x1, #0xffffffffffff
>> ffff000008aa5170:       d503201f        nop
>> ffff000008aa5174:       f00038a2        adrp    x2, ffff0000091bc000 <reset_devices>
>> ffff000008aa5178:       9113e042        add     x2, x2, #0x4f8
>> ffff000008aa517c:       b9402c42        ldr     w2, [x2,#44]
>> ffff000008aa5180:       6b1f005f        cmp     w2, wzr
>> ffff000008aa5184:       540000ac        b.gt    ffff000008aa5198 <__timer_restore_state+0x30>
>> ffff000008aa5188:       d53ce102        mrs     x2, cnthctl_el2
>> ffff000008aa518c:       927ef842        and     x2, x2, #0xfffffffffffffffd
>> ffff000008aa5190:       b2400042        orr     x2, x2, #0x1
>> ffff000008aa5194:       d51ce102        msr     cnthctl_el2, x2
>> ffff000008aa5198:       91400402        add     x2, x0, #0x1, lsl #12
>> ffff000008aa519c:       396dd443        ldrb    w3, [x2,#2933]
>> ffff000008aa51a0:       34000103        cbz     w3, ffff000008aa51c0 <__timer_restore_state+0x58>
>> ffff000008aa51a4:       f945a821        ldr     x1, [x1,#2896]
>> ffff000008aa51a8:       d51ce061        msr     cntvoff_el2, x1
>> ffff000008aa51ac:       f9457441        ldr     x1, [x2,#2792]
>> ffff000008aa51b0:       d51be341        msr     cntv_cval_el0, x1
>> ffff000008aa51b4:       d5033fdf        isb
>> ffff000008aa51b8:       b95ae000        ldr     w0, [x0,#6880]
>> ffff000008aa51bc:       d51be320        msr     cntv_ctl_el0, x0
>> ffff000008aa51c0:       d65f03c0        ret
>>
>> This is now controlled by some date located at FFFF0000091BC524:
>>
>> maz at approximate:~/Work/arm-platforms$ aarch64-linux-gnu-objdump -h vmlinux
>>
>> vmlinux:     file format elf64-littleaarch64
>>
>> Sections:
>> Idx Name          Size      VMA               LMA               File off  Algn
>> [...]
>>  23 .bss          000da348  ffff0000091b8000  ffff0000091b8000  01147a00  2**12
>>                   ALLOC
>>
>> That's the BSS, which we do map in HYP (fairly recent).
> 
> But we cannot map the BSS at the same address though, right?  So
> wouldn't this actually fail?

We map it at the same relative offset, and use adrp to get the base
address (PC relative). So whatever context we're in, we should be OK.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...



More information about the linux-arm-kernel mailing list