[PATCH v2] KVM: arm64: Initialize VCPU mdcr_el2 before loading it
Alexandru Elisei
alexandru.elisei at arm.com
Wed Mar 31 16:25:46 BST 2021
Hi Marc,
On 3/30/21 9:07 PM, Marc Zyngier wrote:
> On Tue, 30 Mar 2021 18:13:07 +0100,
> Alexandru Elisei <alexandru.elisei at arm.com> wrote:
>> Hi Marc,
>>
>> Thanks for having a look!
>>
>> On 3/30/21 10:55 AM, Marc Zyngier wrote:
>>> Hi Alex,
>>>
>>> On Tue, 23 Mar 2021 18:00:57 +0000,
>>> Alexandru Elisei <alexandru.elisei at arm.com> wrote:
>>>> When a VCPU is created, the kvm_vcpu struct is initialized to zero in
>>>> kvm_vm_ioctl_create_vcpu(). On VHE systems, the first time
>>>> vcpu.arch.mdcr_el2 is loaded on hardware is in vcpu_load(), before it is
>>>> set to a sensible value in kvm_arm_setup_debug() later in the run loop. The
>>>> result is that KVM executes for a short time with MDCR_EL2 set to zero.
>>>>
>>>> This has several unintended consequences:
>>>>
>>>> * Setting MDCR_EL2.HPMN to 0 is constrained unpredictable according to ARM
>>>> DDI 0487G.a, page D13-3820. The behavior specified by the architecture
>>>> in this case is for the PE to behave as if MDCR_EL2.HPMN is set to a
>>>> value less than or equal to PMCR_EL0.N, which means that an unknown
>>>> number of counters are now disabled by MDCR_EL2.HPME, which is zero.
>>>>
>>>> * The host configuration for the other debug features controlled by
>>>> MDCR_EL2 is temporarily lost. This has been harmless so far, as Linux
>>>> doesn't use the other fields, but that might change in the future.
>>>>
>>>> Let's avoid both issues by initializing the VCPU's mdcr_el2 field in
>>>> kvm_vcpu_vcpu_first_run_init(), thus making sure that the MDCR_EL2 register
>>>> has a consistent value after each vcpu_load().
>>>>
>>>> Signed-off-by: Alexandru Elisei <alexandru.elisei at arm.com>
>>> This looks strangely similar to 4942dc6638b0 ("KVM: arm64: Write
>>> arch.mdcr_el2 changes since last vcpu_load on VHE"), just at a
>>> different point. Probably worth a Fixes tag.
>> This bug is present in the commit you are mentioning, and from what
>> I can tell it's also present in the commit it's fixing (d5a21bcc2995
>> ("KVM: arm64: Move common VHE/non-VHE trap config in separate
>> functions")) - vcpu->arch.mdcr_el2 is computed in
>> kvm_arm_setup_debug(), which is called after vcpu_load(). My guess
>> is that this bug is from VHE support was added (or soon after).
> Right. Can you please add a Fixes: tag for the same commit? At least
> that'd be consistent.
Yes, I'll do that.
>
>> I can dig further, how far back in time should I aim for?
>>
>>>> ---
>>>> Found by code inspection. Based on v5.12-rc4.
>>>>
>>>> Tested on an odroid-c4 with VHE. vcpu->arch.mdcr_el2 is calculated to be
>>>> 0x4e66. Without this patch, reading MDCR_EL2 after the first vcpu_load() in
>>>> kvm_arch_vcpu_ioctl_run() returns 0; with this patch it returns the correct
>>>> value, 0xe66 (FEAT_SPE is not implemented by the PE).
>>>>
>>>> This patch was initially part of the KVM SPE series [1], but those patches
>>>> haven't seen much activity, so I thought it would be a good idea to send
>>>> this patch separately to draw more attention to it.
>>>>
>>>> Changes in v2:
>>>> * Moved kvm_arm_vcpu_init_debug() earlier in kvm_vcpu_first_run_init() so
>>>> vcpu->arch.mdcr_el2 is calculated even if kvm_vgic_map_resources() fails.
>>>> * Added comment to kvm_arm_setup_mdcr_el2 to explain what testing
>>>> vcpu->guest_debug means.
>>>>
>>>> [1] https://www.spinics.net/lists/kvm-arm/msg42959.html
>>>>
>>>> arch/arm64/include/asm/kvm_host.h | 1 +
>>>> arch/arm64/kvm/arm.c | 3 +-
>>>> arch/arm64/kvm/debug.c | 82 +++++++++++++++++++++----------
>>>> 3 files changed, 59 insertions(+), 27 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>> index 3d10e6527f7d..858c2fcfc043 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -713,6 +713,7 @@ static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>>>> static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>>>>
>>>> void kvm_arm_init_debug(void);
>>>> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
>>>> void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>>> void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>>>> void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>>>> index 7f06ba76698d..7088d8fe7186 100644
>>>> --- a/arch/arm64/kvm/arm.c
>>>> +++ b/arch/arm64/kvm/arm.c
>>>> @@ -580,6 +580,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
>>>>
>>>> vcpu->arch.has_run_once = true;
>>>>
>>>> + kvm_arm_vcpu_init_debug(vcpu);
>>>> +
>>>> if (likely(irqchip_in_kernel(kvm))) {
>>>> /*
>>>> * Map the VGIC hardware resources before running a vcpu the
>>>> @@ -791,7 +793,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>>>> }
>>>>
>>>> kvm_arm_setup_debug(vcpu);
>>>> -
>>> Spurious change?
>> Definitely, thank you for spotting it.
>>
>>>> /**************************************************************
>>>> * Enter the guest
>>>> */
>>>> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
>>>> index 7a7e425616b5..3626d03354f6 100644
>>>> --- a/arch/arm64/kvm/debug.c
>>>> +++ b/arch/arm64/kvm/debug.c
>>>> @@ -68,6 +68,60 @@ void kvm_arm_init_debug(void)
>>>> __this_cpu_write(mdcr_el2, kvm_call_hyp_ret(__kvm_get_mdcr_el2));
>>>> }
>>>>
>>>> +/**
>>>> + * kvm_arm_setup_mdcr_el2 - configure vcpu mdcr_el2 value
>>>> + *
>>>> + * @vcpu: the vcpu pointer
>>>> + * @host_mdcr: host mdcr_el2 value
>>>> + *
>>>> + * This ensures we will trap access to:
>>>> + * - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
>>>> + * - Debug ROM Address (MDCR_EL2_TDRA)
>>>> + * - OS related registers (MDCR_EL2_TDOSA)
>>>> + * - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
>>>> + */
>>>> +static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32 host_mdcr)
>>>> +{
>>>> + bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
>>>> +
>>>> + /*
>>>> + * This also clears MDCR_EL2_E2PB_MASK to disable guest access
>>>> + * to the profiling buffer.
>>>> + */
>>>> + vcpu->arch.mdcr_el2 = host_mdcr & MDCR_EL2_HPMN_MASK;
>>>> + vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
>>>> + MDCR_EL2_TPMS |
>>>> + MDCR_EL2_TPMCR |
>>>> + MDCR_EL2_TDRA |
>>>> + MDCR_EL2_TDOSA);
>>>> +
>>>> + /* Is the VM being debugged by userspace? */
>>>> + if (vcpu->guest_debug) {
>>>> + /* Route all software debug exceptions to EL2 */
>>>> + vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
>>>> + if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW)
>>>> + trap_debug = true;
>>>> + }
>>>> +
>>>> + /* Trap debug register access */
>>>> + if (trap_debug)
>>>> + vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
>>>> +
>>>> + trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
>>>> +}
>>>> +
>>>> +/**
>>>> + * kvm_arm_vcpu_init_debug - setup vcpu debug traps
>>>> + *
>>>> + * @vcpu: the vcpu pointer
>>>> + *
>>>> + * Set vcpu initial mdcr_el2 value.
>>>> + */
>>>> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2));
>>> Given that kvm_arm_setup_mdcr_el2() always takes the current host
>>> value for mdcr_el2, why not moving the read into it and be done with
>>> it?
>> kvm_arm_setup_debug() is called with preemption disabled, and it can
>> use __this_cpu_read(). kvm_arm_vcpu_init_debug() is called with
>> preemption enabled, so it must use this_cpu_read(). I wanted to make
>> the distinction because kvm_arm_setup_debug() is in the run loop.
> I think it would be absolutely fine to make the slow path of
> kvm_vcpu_first_run_init() run with preempt disabled. This happens so
> rarely that that it isn't worth thinking about it.
It looks to me like it's a bit too heavy-handed to run the entire function
kvm_vcpu_first_run_init() with preemption disabled just for __this_cpu_read() in
kvm_arm_setup_mdcr_el2(). Not because of the performance cost (it's negligible, as
it's called exactly once in the VCPU lifetime), but because it's not obvious why
it is needed.
I tried this:
@@ -580,7 +580,9 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
vcpu->arch.has_run_once = true;
- kvm_arm_vcpu_init_debug(vcpu);
+ preempt_disable();
+ kvm_arm_setup_mdcr_el2(vcpu);
+ preempt_enable();
if (likely(irqchip_in_kernel(kvm))) {
/*
and it still looks a bit off to me because preemption needs to be disabled because
of an implementation detail in kvm_arm_setup_mdcr_el2(), as the function operates
on the VCPU struct and preemption can be enabled for that.
I was thinking something like this:
@@ -119,7 +119,9 @@ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32
host_mdcr)
*/
void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu)
{
- kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2));
+ preempt_disable();
+ kvm_arm_setup_mdcr_el2(vcpu);
+ preempt_enable();
}
/**
What do you think?
Thanks,
Alex
>
> Please give it a lockdep run though! ;-)
>
>>> Also, do we really need an extra wrapper?
>> I can remove the wrapper and have kvm_arm_setup_mdcr_el2() use
>> this_cpu_read() for the host's mdcr_el2 value at the cost of a
>> preempt disable/enable in the run loop when preemption is
>> disabled. If you think that would make the code easier to follow, I
>> can certainly do that.
> As explained above, I'd rather you keep the __this_cpu_read() and make
> kvm_vcpu_first_run_init() preemption safe.
>
> Thanks,
>
> M.
>
More information about the linux-arm-kernel
mailing list