[PATCH v2] KVM: arm64: Initialize VCPU mdcr_el2 before loading it

Alexandru Elisei alexandru.elisei at arm.com
Wed Mar 31 16:25:46 BST 2021


Hi Marc,

On 3/30/21 9:07 PM, Marc Zyngier wrote:
> On Tue, 30 Mar 2021 18:13:07 +0100,
> Alexandru Elisei <alexandru.elisei at arm.com> wrote:
>> Hi Marc,
>>
>> Thanks for having a look!
>>
>> On 3/30/21 10:55 AM, Marc Zyngier wrote:
>>> Hi Alex,
>>>
>>> On Tue, 23 Mar 2021 18:00:57 +0000,
>>> Alexandru Elisei <alexandru.elisei at arm.com> wrote:
>>>> When a VCPU is created, the kvm_vcpu struct is initialized to zero in
>>>> kvm_vm_ioctl_create_vcpu(). On VHE systems, the first time
>>>> vcpu.arch.mdcr_el2 is loaded on hardware is in vcpu_load(), before it is
>>>> set to a sensible value in kvm_arm_setup_debug() later in the run loop. The
>>>> result is that KVM executes for a short time with MDCR_EL2 set to zero.
>>>>
>>>> This has several unintended consequences:
>>>>
>>>> * Setting MDCR_EL2.HPMN to 0 is constrained unpredictable according to ARM
>>>>   DDI 0487G.a, page D13-3820. The behavior specified by the architecture
>>>>   in this case is for the PE to behave as if MDCR_EL2.HPMN is set to a
>>>>   value less than or equal to PMCR_EL0.N, which means that an unknown
>>>>   number of counters are now disabled by MDCR_EL2.HPME, which is zero.
>>>>
>>>> * The host configuration for the other debug features controlled by
>>>>   MDCR_EL2 is temporarily lost. This has been harmless so far, as Linux
>>>>   doesn't use the other fields, but that might change in the future.
>>>>
>>>> Let's avoid both issues by initializing the VCPU's mdcr_el2 field in
>>>> kvm_vcpu_vcpu_first_run_init(), thus making sure that the MDCR_EL2 register
>>>> has a consistent value after each vcpu_load().
>>>>
>>>> Signed-off-by: Alexandru Elisei <alexandru.elisei at arm.com>
>>> This looks strangely similar to 4942dc6638b0 ("KVM: arm64: Write
>>> arch.mdcr_el2 changes since last vcpu_load on VHE"), just at a
>>> different point. Probably worth a Fixes tag.
>> This bug is present in the commit you are mentioning, and from what
>> I can tell it's also present in the commit it's fixing (d5a21bcc2995
>> ("KVM: arm64: Move common VHE/non-VHE trap config in separate
>> functions")) - vcpu->arch.mdcr_el2 is computed in
>> kvm_arm_setup_debug(), which is called after vcpu_load(). My guess
>> is that this bug is from VHE support was added (or soon after).
> Right. Can you please add a Fixes: tag for the same commit? At least
> that'd be consistent.

Yes, I'll do that.

>
>> I can dig further, how far back in time should I aim for?
>>
>>>> ---
>>>> Found by code inspection. Based on v5.12-rc4.
>>>>
>>>> Tested on an odroid-c4 with VHE. vcpu->arch.mdcr_el2 is calculated to be
>>>> 0x4e66. Without this patch, reading MDCR_EL2 after the first vcpu_load() in
>>>> kvm_arch_vcpu_ioctl_run() returns 0; with this patch it returns the correct
>>>> value, 0xe66 (FEAT_SPE is not implemented by the PE).
>>>>
>>>> This patch was initially part of the KVM SPE series [1], but those patches
>>>> haven't seen much activity, so I thought it would be a good idea to send
>>>> this patch separately to draw more attention to it.
>>>>
>>>> Changes in v2:
>>>> * Moved kvm_arm_vcpu_init_debug() earlier in kvm_vcpu_first_run_init() so
>>>>   vcpu->arch.mdcr_el2 is calculated even if kvm_vgic_map_resources() fails.
>>>> * Added comment to kvm_arm_setup_mdcr_el2 to explain what testing
>>>>   vcpu->guest_debug means.
>>>>
>>>> [1] https://www.spinics.net/lists/kvm-arm/msg42959.html
>>>>
>>>>  arch/arm64/include/asm/kvm_host.h |  1 +
>>>>  arch/arm64/kvm/arm.c              |  3 +-
>>>>  arch/arm64/kvm/debug.c            | 82 +++++++++++++++++++++----------
>>>>  3 files changed, 59 insertions(+), 27 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>> index 3d10e6527f7d..858c2fcfc043 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -713,6 +713,7 @@ static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>>>>  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>>>>  
>>>>  void kvm_arm_init_debug(void);
>>>> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
>>>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>>>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>>>>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>>>> index 7f06ba76698d..7088d8fe7186 100644
>>>> --- a/arch/arm64/kvm/arm.c
>>>> +++ b/arch/arm64/kvm/arm.c
>>>> @@ -580,6 +580,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
>>>>  
>>>>  	vcpu->arch.has_run_once = true;
>>>>  
>>>> +	kvm_arm_vcpu_init_debug(vcpu);
>>>> +
>>>>  	if (likely(irqchip_in_kernel(kvm))) {
>>>>  		/*
>>>>  		 * Map the VGIC hardware resources before running a vcpu the
>>>> @@ -791,7 +793,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>>>>  		}
>>>>  
>>>>  		kvm_arm_setup_debug(vcpu);
>>>> -
>>> Spurious change?
>> Definitely, thank you for spotting it.
>>
>>>>  		/**************************************************************
>>>>  		 * Enter the guest
>>>>  		 */
>>>> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
>>>> index 7a7e425616b5..3626d03354f6 100644
>>>> --- a/arch/arm64/kvm/debug.c
>>>> +++ b/arch/arm64/kvm/debug.c
>>>> @@ -68,6 +68,60 @@ void kvm_arm_init_debug(void)
>>>>  	__this_cpu_write(mdcr_el2, kvm_call_hyp_ret(__kvm_get_mdcr_el2));
>>>>  }
>>>>  
>>>> +/**
>>>> + * kvm_arm_setup_mdcr_el2 - configure vcpu mdcr_el2 value
>>>> + *
>>>> + * @vcpu:	the vcpu pointer
>>>> + * @host_mdcr:  host mdcr_el2 value
>>>> + *
>>>> + * This ensures we will trap access to:
>>>> + *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
>>>> + *  - Debug ROM Address (MDCR_EL2_TDRA)
>>>> + *  - OS related registers (MDCR_EL2_TDOSA)
>>>> + *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
>>>> + */
>>>> +static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32 host_mdcr)
>>>> +{
>>>> +	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
>>>> +
>>>> +	/*
>>>> +	 * This also clears MDCR_EL2_E2PB_MASK to disable guest access
>>>> +	 * to the profiling buffer.
>>>> +	 */
>>>> +	vcpu->arch.mdcr_el2 = host_mdcr & MDCR_EL2_HPMN_MASK;
>>>> +	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
>>>> +				MDCR_EL2_TPMS |
>>>> +				MDCR_EL2_TPMCR |
>>>> +				MDCR_EL2_TDRA |
>>>> +				MDCR_EL2_TDOSA);
>>>> +
>>>> +	/* Is the VM being debugged by userspace? */
>>>> +	if (vcpu->guest_debug) {
>>>> +		/* Route all software debug exceptions to EL2 */
>>>> +		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
>>>> +		if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW)
>>>> +			trap_debug = true;
>>>> +	}
>>>> +
>>>> +	/* Trap debug register access */
>>>> +	if (trap_debug)
>>>> +		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
>>>> +
>>>> +	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
>>>> +}
>>>> +
>>>> +/**
>>>> + * kvm_arm_vcpu_init_debug - setup vcpu debug traps
>>>> + *
>>>> + * @vcpu:	the vcpu pointer
>>>> + *
>>>> + * Set vcpu initial mdcr_el2 value.
>>>> + */
>>>> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2));
>>> Given that kvm_arm_setup_mdcr_el2() always takes the current host
>>> value for mdcr_el2, why not moving the read into it and be done with
>>> it?
>> kvm_arm_setup_debug() is called with preemption disabled, and it can
>> use __this_cpu_read(). kvm_arm_vcpu_init_debug() is called with
>> preemption enabled, so it must use this_cpu_read(). I wanted to make
>> the distinction because kvm_arm_setup_debug() is in the run loop.
> I think it would be absolutely fine to make the slow path of
> kvm_vcpu_first_run_init() run with preempt disabled. This happens so
> rarely that that it isn't worth thinking about it.

It looks to me like it's a bit too heavy-handed to run the entire function
kvm_vcpu_first_run_init() with preemption disabled just for __this_cpu_read() in
kvm_arm_setup_mdcr_el2(). Not because of the performance cost (it's negligible, as
it's called exactly once in the VCPU lifetime), but because it's not obvious why
it is needed.

I tried this:

@@ -580,7 +580,9 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 
        vcpu->arch.has_run_once = true;
 
-       kvm_arm_vcpu_init_debug(vcpu);
+       preempt_disable();
+       kvm_arm_setup_mdcr_el2(vcpu);
+       preempt_enable();
 
        if (likely(irqchip_in_kernel(kvm))) {
                /*

and it still looks a bit off to me because preemption needs to be disabled because
of an implementation detail in kvm_arm_setup_mdcr_el2(), as the function operates
on the VCPU struct and preemption can be enabled for that.

I was thinking something like this:

@@ -119,7 +119,9 @@ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32
host_mdcr)
  */
 void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu)
 {
-       kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2));
+       preempt_disable();
+       kvm_arm_setup_mdcr_el2(vcpu);
+       preempt_enable();
 }
 
 /**

What do you think?

Thanks,

Alex

>
> Please give it a lockdep run though! ;-)
>
>>> Also, do we really need an extra wrapper?
>> I can remove the wrapper and have kvm_arm_setup_mdcr_el2() use
>> this_cpu_read() for the host's mdcr_el2 value at the cost of a
>> preempt disable/enable in the run loop when preemption is
>> disabled. If you think that would make the code easier to follow, I
>> can certainly do that.
> As explained above, I'd rather you keep the __this_cpu_read() and make
> kvm_vcpu_first_run_init() preemption safe.
>
> Thanks,
>
> 	M.
>



More information about the linux-arm-kernel mailing list