[PATCH 2/3] KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired.
Ganapatrao Kulkarni
gankulkarni at os.amperecomputing.com
Tue Jan 10 00:41:44 PST 2023
On 02-01-2023 05:16 pm, Marc Zyngier wrote:
> On Thu, 29 Dec 2022 13:53:15 +0000,
> Marc Zyngier <maz at kernel.org> wrote:
>>
>> On Wed, 24 Aug 2022 07:03:03 +0100,
>> Ganapatrao Kulkarni <gankulkarni at os.amperecomputing.com> wrote:
>>>
>>> Guest-Hypervisor forwards the timer interrupt to Guest-Guest, if it is
>>> enabled, unmasked and ISTATUS bit of register CNTV_CTL_EL0 is set for a
>>> loaded timer.
>>>
>>> For NV2 implementation, the Host-Hypervisor is not emulating the ISTATUS
>>> bit while forwarding the Emulated Vtimer Interrupt to Guest-Hypervisor.
>>> This results in the drop of interrupt from Guest-Hypervisor, where as
>>> Host Hypervisor marked it as an active interrupt and expecting Guest-Guest
>>> to consume and acknowledge. Due to this, some of the Guest-Guest vCPUs
>>> are stuck in Idle thread and rcu soft lockups are seen.
>>>
>>> This issue is not seen with NV1 case since the register CNTV_CTL_EL0 read
>>> trap handler is emulating the ISTATUS bit.
>>>
>>> Adding code to set/emulate the ISTATUS when the emulated timers are fired.
>>>
>>> Signed-off-by: Ganapatrao Kulkarni <gankulkarni at os.amperecomputing.com>
>>> ---
>>> arch/arm64/kvm/arch_timer.c | 5 +++++
>>> 1 file changed, 5 insertions(+)
>>>
>>> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
>>> index 27a6ec46803a..0b32d943d2d5 100644
>>> --- a/arch/arm64/kvm/arch_timer.c
>>> +++ b/arch/arm64/kvm/arch_timer.c
>>> @@ -63,6 +63,7 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
>>> struct arch_timer_context *timer,
>>> enum kvm_arch_timer_regs treg);
>>> static bool kvm_arch_timer_get_input_level(int vintid);
>>> +static u64 read_timer_ctl(struct arch_timer_context *timer);
>>>
>>> static struct irq_ops arch_timer_irq_ops = {
>>> .get_input_level = kvm_arch_timer_get_input_level,
>>> @@ -356,6 +357,8 @@ static enum hrtimer_restart kvm_hrtimer_expire(struct hrtimer *hrt)
>>> return HRTIMER_RESTART;
>>> }
>>>
>>> + /* Timer emulated, emulate ISTATUS also */
>>> + timer_set_ctl(ctx, read_timer_ctl(ctx));
>>
>> Why should we do that for non-NV2 configurations?
>>
>>> kvm_timer_update_irq(vcpu, true, ctx);
>>> return HRTIMER_NORESTART;
>>> }
>>> @@ -458,6 +461,8 @@ static void timer_emulate(struct arch_timer_context *ctx)
>>> trace_kvm_timer_emulate(ctx, should_fire);
>>>
>>> if (should_fire != ctx->irq.level) {
>>> + /* Timer emulated, emulate ISTATUS also */
>>> + timer_set_ctl(ctx, read_timer_ctl(ctx));
>>> kvm_timer_update_irq(ctx->vcpu, should_fire, ctx);
>>> return;
>>> }
>>
>> I'm not overly keen on this. Yes, we can set the status bit there. But
>> conversely, the bit will not get cleared when the guest reprograms the
>> timer, and will take a full exit/entry cycle for it to appear.
>>
>> Ergo, the architecture is buggy as memory (the VNCR page) cannot be
>> used to emulate something as dynamic as a timer.
>>
>> It is only with FEAT_ECV that we can solve this correctly by trapping
>> the counter/timer accesses and emulate them for the guest hypervisor.
>> I'd rather we add support for that, as I expect all the FEAT_NV2
>> implementations to have it (and hopefully FEAT_FGT as well).
>
> So I went ahead and implemented some very basic FEAT_ECV support to
> correctly emulate the timers (trapping the CTL/CVAL accesses).
>
> Performance dropped like a rock (~30% extra overhead) for L2
> exit-heavy workloads that are terminated in userspace, such as virtio.
> For those workloads, vcpu_{load,put}() in L1 now generate extra traps,
> as we save/restore the timer context, and this is enough to make
> things visibly slower, even on a pretty fast machine.
>
> I managed to get *some* performance back by satisfying CTL/CVAL reads
> very early on the exit path (a pretty common theme with NV). Which
> means we end-up needing something like what you have -- only a bit
> more complete. I came up with the following:
Yes it is more appropriate, this moves ISTATUS update to single place.
>
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 4945c5b96f05..a198a6211e2a 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -450,6 +450,25 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
> {
> int ret;
>
> + /*
> + * Paper over NV2 brokenness by publishing the interrupt status
> + * bit. This still results in a poor quality of emulation (guest
> + * writes will have no effect until the next exit).
> + *
> + * But hey, it's fast, right?
> + */
> + if (vcpu_has_nv2(vcpu) && is_hyp_ctxt(vcpu) &&
> + (timer_ctx == vcpu_vtimer(vcpu) || timer_ctx == vcpu_ptimer(vcpu))) {
> + u32 ctl = timer_get_ctl(timer_ctx);
> +
> + if (new_level)
> + ctl |= ARCH_TIMER_CTRL_IT_STAT;
> + else
> + ctl &= ~ARCH_TIMER_CTRL_IT_STAT;
> +
> + timer_set_ctl(timer_ctx, ctl);
> + }
> +
> timer_ctx->irq.level = new_level;
> trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
> timer_ctx->irq.level);
>
> which reports the interrupt state in all cases.
>
> Does this work for you?
This works.
Are you going to pull this diff/patch in to your 6.2-nv tree? or you
want me to send an updated patch?
>
> Thanks,
>
> M.
>
Thanks,
Ganapat
More information about the linux-arm-kernel
mailing list