[PATCH v2] KVM: arm64: Initialize VCPU mdcr_el2 before loading it

Marc Zyngier maz at kernel.org
Tue Mar 30 21:07:21 BST 2021


On Tue, 30 Mar 2021 18:13:07 +0100,
Alexandru Elisei <alexandru.elisei at arm.com> wrote:
> 
> Hi Marc,
> 
> Thanks for having a look!
> 
> On 3/30/21 10:55 AM, Marc Zyngier wrote:
> > Hi Alex,
> >
> > On Tue, 23 Mar 2021 18:00:57 +0000,
> > Alexandru Elisei <alexandru.elisei at arm.com> wrote:
> >> When a VCPU is created, the kvm_vcpu struct is initialized to zero in
> >> kvm_vm_ioctl_create_vcpu(). On VHE systems, the first time
> >> vcpu.arch.mdcr_el2 is loaded on hardware is in vcpu_load(), before it is
> >> set to a sensible value in kvm_arm_setup_debug() later in the run loop. The
> >> result is that KVM executes for a short time with MDCR_EL2 set to zero.
> >>
> >> This has several unintended consequences:
> >>
> >> * Setting MDCR_EL2.HPMN to 0 is constrained unpredictable according to ARM
> >>   DDI 0487G.a, page D13-3820. The behavior specified by the architecture
> >>   in this case is for the PE to behave as if MDCR_EL2.HPMN is set to a
> >>   value less than or equal to PMCR_EL0.N, which means that an unknown
> >>   number of counters are now disabled by MDCR_EL2.HPME, which is zero.
> >>
> >> * The host configuration for the other debug features controlled by
> >>   MDCR_EL2 is temporarily lost. This has been harmless so far, as Linux
> >>   doesn't use the other fields, but that might change in the future.
> >>
> >> Let's avoid both issues by initializing the VCPU's mdcr_el2 field in
> >> kvm_vcpu_vcpu_first_run_init(), thus making sure that the MDCR_EL2 register
> >> has a consistent value after each vcpu_load().
> >>
> >> Signed-off-by: Alexandru Elisei <alexandru.elisei at arm.com>
> > This looks strangely similar to 4942dc6638b0 ("KVM: arm64: Write
> > arch.mdcr_el2 changes since last vcpu_load on VHE"), just at a
> > different point. Probably worth a Fixes tag.
> 
> This bug is present in the commit you are mentioning, and from what
> I can tell it's also present in the commit it's fixing (d5a21bcc2995
> ("KVM: arm64: Move common VHE/non-VHE trap config in separate
> functions")) - vcpu->arch.mdcr_el2 is computed in
> kvm_arm_setup_debug(), which is called after vcpu_load(). My guess
> is that this bug is from VHE support was added (or soon after).

Right. Can you please add a Fixes: tag for the same commit? At least
that'd be consistent.

> I can dig further, how far back in time should I aim for?
> 
> >
> >> ---
> >> Found by code inspection. Based on v5.12-rc4.
> >>
> >> Tested on an odroid-c4 with VHE. vcpu->arch.mdcr_el2 is calculated to be
> >> 0x4e66. Without this patch, reading MDCR_EL2 after the first vcpu_load() in
> >> kvm_arch_vcpu_ioctl_run() returns 0; with this patch it returns the correct
> >> value, 0xe66 (FEAT_SPE is not implemented by the PE).
> >>
> >> This patch was initially part of the KVM SPE series [1], but those patches
> >> haven't seen much activity, so I thought it would be a good idea to send
> >> this patch separately to draw more attention to it.
> >>
> >> Changes in v2:
> >> * Moved kvm_arm_vcpu_init_debug() earlier in kvm_vcpu_first_run_init() so
> >>   vcpu->arch.mdcr_el2 is calculated even if kvm_vgic_map_resources() fails.
> >> * Added comment to kvm_arm_setup_mdcr_el2 to explain what testing
> >>   vcpu->guest_debug means.
> >>
> >> [1] https://www.spinics.net/lists/kvm-arm/msg42959.html
> >>
> >>  arch/arm64/include/asm/kvm_host.h |  1 +
> >>  arch/arm64/kvm/arm.c              |  3 +-
> >>  arch/arm64/kvm/debug.c            | 82 +++++++++++++++++++++----------
> >>  3 files changed, 59 insertions(+), 27 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >> index 3d10e6527f7d..858c2fcfc043 100644
> >> --- a/arch/arm64/include/asm/kvm_host.h
> >> +++ b/arch/arm64/include/asm/kvm_host.h
> >> @@ -713,6 +713,7 @@ static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> >>  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
> >>  
> >>  void kvm_arm_init_debug(void);
> >> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
> >>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> >>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
> >>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
> >> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >> index 7f06ba76698d..7088d8fe7186 100644
> >> --- a/arch/arm64/kvm/arm.c
> >> +++ b/arch/arm64/kvm/arm.c
> >> @@ -580,6 +580,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
> >>  
> >>  	vcpu->arch.has_run_once = true;
> >>  
> >> +	kvm_arm_vcpu_init_debug(vcpu);
> >> +
> >>  	if (likely(irqchip_in_kernel(kvm))) {
> >>  		/*
> >>  		 * Map the VGIC hardware resources before running a vcpu the
> >> @@ -791,7 +793,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> >>  		}
> >>  
> >>  		kvm_arm_setup_debug(vcpu);
> >> -
> > Spurious change?
> 
> Definitely, thank you for spotting it.
> 
> >
> >>  		/**************************************************************
> >>  		 * Enter the guest
> >>  		 */
> >> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> >> index 7a7e425616b5..3626d03354f6 100644
> >> --- a/arch/arm64/kvm/debug.c
> >> +++ b/arch/arm64/kvm/debug.c
> >> @@ -68,6 +68,60 @@ void kvm_arm_init_debug(void)
> >>  	__this_cpu_write(mdcr_el2, kvm_call_hyp_ret(__kvm_get_mdcr_el2));
> >>  }
> >>  
> >> +/**
> >> + * kvm_arm_setup_mdcr_el2 - configure vcpu mdcr_el2 value
> >> + *
> >> + * @vcpu:	the vcpu pointer
> >> + * @host_mdcr:  host mdcr_el2 value
> >> + *
> >> + * This ensures we will trap access to:
> >> + *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
> >> + *  - Debug ROM Address (MDCR_EL2_TDRA)
> >> + *  - OS related registers (MDCR_EL2_TDOSA)
> >> + *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
> >> + */
> >> +static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32 host_mdcr)
> >> +{
> >> +	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
> >> +
> >> +	/*
> >> +	 * This also clears MDCR_EL2_E2PB_MASK to disable guest access
> >> +	 * to the profiling buffer.
> >> +	 */
> >> +	vcpu->arch.mdcr_el2 = host_mdcr & MDCR_EL2_HPMN_MASK;
> >> +	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> >> +				MDCR_EL2_TPMS |
> >> +				MDCR_EL2_TPMCR |
> >> +				MDCR_EL2_TDRA |
> >> +				MDCR_EL2_TDOSA);
> >> +
> >> +	/* Is the VM being debugged by userspace? */
> >> +	if (vcpu->guest_debug) {
> >> +		/* Route all software debug exceptions to EL2 */
> >> +		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
> >> +		if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW)
> >> +			trap_debug = true;
> >> +	}
> >> +
> >> +	/* Trap debug register access */
> >> +	if (trap_debug)
> >> +		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
> >> +
> >> +	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
> >> +}
> >> +
> >> +/**
> >> + * kvm_arm_vcpu_init_debug - setup vcpu debug traps
> >> + *
> >> + * @vcpu:	the vcpu pointer
> >> + *
> >> + * Set vcpu initial mdcr_el2 value.
> >> + */
> >> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu)
> >> +{
> >> +	kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2));
> > Given that kvm_arm_setup_mdcr_el2() always takes the current host
> > value for mdcr_el2, why not moving the read into it and be done with
> > it?
> 
> kvm_arm_setup_debug() is called with preemption disabled, and it can
> use __this_cpu_read(). kvm_arm_vcpu_init_debug() is called with
> preemption enabled, so it must use this_cpu_read(). I wanted to make
> the distinction because kvm_arm_setup_debug() is in the run loop.

I think it would be absolutely fine to make the slow path of
kvm_vcpu_first_run_init() run with preempt disabled. This happens so
rarely that that it isn't worth thinking about it.

Please give it a lockdep run though! ;-)

> 
> >
> > Also, do we really need an extra wrapper?
> 
> I can remove the wrapper and have kvm_arm_setup_mdcr_el2() use
> this_cpu_read() for the host's mdcr_el2 value at the cost of a
> preempt disable/enable in the run loop when preemption is
> disabled. If you think that would make the code easier to follow, I
> can certainly do that.

As explained above, I'd rather you keep the __this_cpu_read() and make
kvm_vcpu_first_run_init() preemption safe.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list