[PATCH v5 07/14] KVM: ARM: World-switch implementation

Wed Jan 16 07:57:12 EST 2013

On Tue, Jan 15, 2013 at 11:08:24PM -0500, Christoffer Dall wrote:
> On Tue, Jan 15, 2013 at 9:08 PM, Christoffer Dall
> <c.dall at virtualopensystems.com> wrote:
> > On Tue, Jan 15, 2013 at 4:43 AM, Gleb Natapov <gleb at redhat.com> wrote:
> >> On Tue, Jan 08, 2013 at 01:39:24PM -0500, Christoffer Dall wrote:
> >>> Provides complete world-switch implementation to switch to other guests
> >>> running in non-secure modes. Includes Hyp exception handlers that
> >>> capture necessary exception information and stores the information on
> >>> the VCPU and KVM structures.
> >>>
> >>> The following Hyp-ABI is also documented in the code:
> >>>
> >>> Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
> >>>    Switching to Hyp mode is done through a simple HVC #0 instruction. The
> >>>    exception vector code will check that the HVC comes from VMID==0 and if
> >>>    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
> >>>    - r0 contains a pointer to a HYP function
> >>>    - r1, r2, and r3 contain arguments to the above function.
> >>>    - The HYP function will be called with its arguments in r0, r1 and r2.
> >>>    On HYP function return, we return directly to SVC.
> >>>
> >>> A call to a function executing in Hyp mode is performed like the following:
> >>>
> >>>         <svc code>
> >>>         ldr     r0, =BSYM(my_hyp_fn)
> >>>         ldr     r1, =my_param
> >>>         hvc #0  ; Call my_hyp_fn(my_param) from HYP mode
> >>>         <svc code>
> >>>
> >>> Otherwise, the world-switch is pretty straight-forward. All state that
> >>> can be modified by the guest is first backed up on the Hyp stack and the
> >>> VCPU values is loaded onto the hardware. State, which is not loaded, but
> >>> theoretically modifiable by the guest is protected through the
> >>> virtualiation features to generate a trap and cause software emulation.
> >>> Upon guest returns, all state is restored from hardware onto the VCPU
> >>> struct and the original state is restored from the Hyp-stack onto the
> >>> hardware.
> >>>
> >>> SMP support using the VMPIDR calculated on the basis of the host MPIDR
> >>> and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
> >>>
> >>> Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
> >>> a separate patch into the appropriate patches introducing the
> >>> functionality. Note that the VMIDs are stored per VM as required by the ARM
> >>> architecture reference manual.
> >>>
> >>> To support VFP/NEON we trap those instructions using the HPCTR. When
> >>> we trap, we switch the FPU.  After a guest exit, the VFP state is
> >>> returned to the host.  When disabling access to floating point
> >>> instructions, we also mask FPEXC_EN in order to avoid the guest
> >>> receiving Undefined instruction exceptions before we have a chance to
> >>> switch back the floating point state.  We are reusing vfp_hard_struct,
> >>> so we depend on VFPv3 being enabled in the host kernel, if not, we still
> >>> trap cp10 and cp11 in order to inject an undefined instruction exception
> >>> whenever the guest tries to use VFP/NEON. VFP/NEON developed by
> >>> Antionios Motakis and Rusty Russell.
> >>>
> >>> Aborts that are permission faults, and not stage-1 page table walk, do
> >>> not report the faulting address in the HPFAR.  We have to resolve the
> >>> IPA, and store it just like the HPFAR register on the VCPU struct. If
> >>> the IPA cannot be resolved, it means another CPU is playing with the
> >>> page tables, and we simply restart the guest.  This quirk was fixed by
> >>> Marc Zyngier.
> >>>
> >>> Reviewed-by: Marcelo Tosatti <mtosatti at redhat.com>
> >>> Signed-off-by: Rusty Russell <rusty.russell at linaro.org>
> >>> Signed-off-by: Antonios Motakis <a.motakis at virtualopensystems.com>
> >>> Signed-off-by: Marc Zyngier <marc.zyngier at arm.com>
> >>> Signed-off-by: Christoffer Dall <c.dall at virtualopensystems.com>
> >>> ---
> >>>  arch/arm/include/asm/kvm_arm.h  |   51 ++++
> >>>  arch/arm/include/asm/kvm_host.h |   10 +
> >>>  arch/arm/kernel/asm-offsets.c   |   25 ++
> >>>  arch/arm/kvm/arm.c              |  187 ++++++++++++++++
> >>>  arch/arm/kvm/interrupts.S       |  396 +++++++++++++++++++++++++++++++++++
> >>>  arch/arm/kvm/interrupts_head.S  |  443 +++++++++++++++++++++++++++++++++++++++
> >>>  6 files changed, 1108 insertions(+), 4 deletions(-)
> >>>  create mode 100644 arch/arm/kvm/interrupts_head.S
> >>>
> >>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> >>> index fb22ee8..a3262a2 100644
> >>> --- a/arch/arm/include/asm/kvm_arm.h
> >>> +++ b/arch/arm/include/asm/kvm_arm.h
> >>> @@ -98,6 +98,18 @@
> >>>  #define TTBCR_T0SZ   3
> >>>  #define HTCR_MASK    (TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
> >>>
> >>> +/* Hyp System Trap Register */
> >>> +#define HSTR_T(x)    (1 << x)
> >>> +#define HSTR_TTEE    (1 << 16)
> >>> +#define HSTR_TJDBX   (1 << 17)
> >>> +
> >>> +/* Hyp Coprocessor Trap Register */
> >>> +#define HCPTR_TCP(x) (1 << x)
> >>> +#define HCPTR_TCP_MASK       (0x3fff)
> >>> +#define HCPTR_TASE   (1 << 15)
> >>> +#define HCPTR_TTA    (1 << 20)
> >>> +#define HCPTR_TCPAC  (1 << 31)
> >>> +
> >>>  /* Hyp Debug Configuration Register bits */
> >>>  #define HDCR_TDRA    (1 << 11)
> >>>  #define HDCR_TDOSA   (1 << 10)
> >>> @@ -144,6 +156,45 @@
> >>>  #else
> >>>  #define VTTBR_X              (5 - KVM_T0SZ)
> >>>  #endif
> >>> +#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
> >>> +#define VTTBR_BADDR_MASK  (((1LLU << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT)
> >>> +#define VTTBR_VMID_SHIFT  (48LLU)
> >>> +#define VTTBR_VMID_MASK        (0xffLLU << VTTBR_VMID_SHIFT)
> >>> +
> >>> +/* Hyp Syndrome Register (HSR) bits */
> >>> +#define HSR_EC_SHIFT (26)
> >>> +#define HSR_EC               (0x3fU << HSR_EC_SHIFT)
> >>> +#define HSR_IL               (1U << 25)
> >>> +#define HSR_ISS              (HSR_IL - 1)
> >>> +#define HSR_ISV_SHIFT        (24)
> >>> +#define HSR_ISV              (1U << HSR_ISV_SHIFT)
> >>> +#define HSR_FSC              (0x3f)
> >>> +#define HSR_FSC_TYPE (0x3c)
> >>> +#define HSR_WNR              (1 << 6)
> >>> +
> >>> +#define FSC_FAULT    (0x04)
> >>> +#define FSC_PERM     (0x0c)
> >>> +
> >>> +/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
> >>> +#define HPFAR_MASK   (~0xf)
> >>>
> >>> +#define HSR_EC_UNKNOWN       (0x00)
> >>> +#define HSR_EC_WFI   (0x01)
> >>> +#define HSR_EC_CP15_32       (0x03)
> >>> +#define HSR_EC_CP15_64       (0x04)
> >>> +#define HSR_EC_CP14_MR       (0x05)
> >>> +#define HSR_EC_CP14_LS       (0x06)
> >>> +#define HSR_EC_CP_0_13       (0x07)
> >>> +#define HSR_EC_CP10_ID       (0x08)
> >>> +#define HSR_EC_JAZELLE       (0x09)
> >>> +#define HSR_EC_BXJ   (0x0A)
> >>> +#define HSR_EC_CP14_64       (0x0C)
> >>> +#define HSR_EC_SVC_HYP       (0x11)
> >>> +#define HSR_EC_HVC   (0x12)
> >>> +#define HSR_EC_SMC   (0x13)
> >>> +#define HSR_EC_IABT  (0x20)
> >>> +#define HSR_EC_IABT_HYP      (0x21)
> >>> +#define HSR_EC_DABT  (0x24)
> >>> +#define HSR_EC_DABT_HYP      (0x25)
> >>>
> >>>  #endif /* __ARM_KVM_ARM_H__ */
> >>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >>> index 1de6f0d..ddb09da 100644
> >>> --- a/arch/arm/include/asm/kvm_host.h
> >>> +++ b/arch/arm/include/asm/kvm_host.h
> >>> @@ -21,6 +21,7 @@
> >>>
> >>>  #include <asm/kvm.h>
> >>>  #include <asm/kvm_asm.h>
> >>> +#include <asm/fpstate.h>
> >>>
> >>>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
> >>>  #define KVM_USER_MEM_SLOTS 32
> >>> @@ -85,6 +86,14 @@ struct kvm_vcpu_arch {
> >>>       u32 hxfar;              /* Hyp Data/Inst Fault Address Register */
> >>>       u32 hpfar;              /* Hyp IPA Fault Address Register */
> >>>
> >>> +     /* Floating point registers (VFP and Advanced SIMD/NEON) */
> >>> +     struct vfp_hard_struct vfp_guest;
> >>> +     struct vfp_hard_struct *vfp_host;
> >>> +
> >>> +     /*
> >>> +      * Anything that is not used directly from assembly code goes
> >>> +      * here.
> >>> +      */
> >>>       /* Interrupt related fields */
> >>>       u32 irq_lines;          /* IRQ and FIQ levels */
> >>>
> >>> @@ -112,6 +121,7 @@ struct kvm_one_reg;
> >>>  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> >>>  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> >>>  u64 kvm_call_hyp(void *hypfn, ...);
> >>> +void force_vm_exit(const cpumask_t *mask);
> >>>
> >>>  #define KVM_ARCH_WANT_MMU_NOTIFIER
> >>>  struct kvm;
> >>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
> >>> index c985b48..c8b3272 100644
> >>> --- a/arch/arm/kernel/asm-offsets.c
> >>> +++ b/arch/arm/kernel/asm-offsets.c
> >>> @@ -13,6 +13,9 @@
> >>>  #include <linux/sched.h>
> >>>  #include <linux/mm.h>
> >>>  #include <linux/dma-mapping.h>
> >>> +#ifdef CONFIG_KVM_ARM_HOST
> >>> +#include <linux/kvm_host.h>
> >>> +#endif
> >>>  #include <asm/cacheflush.h>
> >>>  #include <asm/glue-df.h>
> >>>  #include <asm/glue-pf.h>
> >>> @@ -146,5 +149,27 @@ int main(void)
> >>>    DEFINE(DMA_BIDIRECTIONAL,  DMA_BIDIRECTIONAL);
> >>>    DEFINE(DMA_TO_DEVICE,              DMA_TO_DEVICE);
> >>>    DEFINE(DMA_FROM_DEVICE,    DMA_FROM_DEVICE);
> >>> +#ifdef CONFIG_KVM_ARM_HOST
> >>> +  DEFINE(VCPU_KVM,           offsetof(struct kvm_vcpu, kvm));
> >>> +  DEFINE(VCPU_MIDR,          offsetof(struct kvm_vcpu, arch.midr));
> >>> +  DEFINE(VCPU_CP15,          offsetof(struct kvm_vcpu, arch.cp15));
> >>> +  DEFINE(VCPU_VFP_GUEST,     offsetof(struct kvm_vcpu, arch.vfp_guest));
> >>> +  DEFINE(VCPU_VFP_HOST,              offsetof(struct kvm_vcpu, arch.vfp_host));
> >>> +  DEFINE(VCPU_REGS,          offsetof(struct kvm_vcpu, arch.regs));
> >>> +  DEFINE(VCPU_USR_REGS,              offsetof(struct kvm_vcpu, arch.regs.usr_regs));
> >>> +  DEFINE(VCPU_SVC_REGS,              offsetof(struct kvm_vcpu, arch.regs.svc_regs));
> >>> +  DEFINE(VCPU_ABT_REGS,              offsetof(struct kvm_vcpu, arch.regs.abt_regs));
> >>> +  DEFINE(VCPU_UND_REGS,              offsetof(struct kvm_vcpu, arch.regs.und_regs));
> >>> +  DEFINE(VCPU_IRQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.irq_regs));
> >>> +  DEFINE(VCPU_FIQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
> >>> +  DEFINE(VCPU_PC,            offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
> >>> +  DEFINE(VCPU_CPSR,          offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
> >>> +  DEFINE(VCPU_IRQ_LINES,     offsetof(struct kvm_vcpu, arch.irq_lines));
> >>> +  DEFINE(VCPU_HSR,           offsetof(struct kvm_vcpu, arch.hsr));
> >>> +  DEFINE(VCPU_HxFAR,         offsetof(struct kvm_vcpu, arch.hxfar));
> >>> +  DEFINE(VCPU_HPFAR,         offsetof(struct kvm_vcpu, arch.hpfar));
> >>> +  DEFINE(VCPU_HYP_PC,                offsetof(struct kvm_vcpu, arch.hyp_pc));
> >>> +  DEFINE(KVM_VTTBR,          offsetof(struct kvm, arch.vttbr));
> >>> +#endif
> >>>    return 0;
> >>>  }
> >>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >>> index 9b4566e..c94d278 100644
> >>> --- a/arch/arm/kvm/arm.c
> >>> +++ b/arch/arm/kvm/arm.c
> >>> @@ -40,6 +40,7 @@
> >>>  #include <asm/kvm_arm.h>
> >>>  #include <asm/kvm_asm.h>
> >>>  #include <asm/kvm_mmu.h>
> >>> +#include <asm/kvm_emulate.h>
> >>>
> >>>  #ifdef REQUIRES_VIRT
> >>>  __asm__(".arch_extension     virt");
> >>> @@ -49,6 +50,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> >>>  static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
> >>>  static unsigned long hyp_default_vectors;
> >>>
> >>> +/* The VMID used in the VTTBR */
> >>> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
> >>> +static u8 kvm_next_vmid;
> >>> +static DEFINE_SPINLOCK(kvm_vmid_lock);
> >>>
> >>>  int kvm_arch_hardware_enable(void *garbage)
> >>>  {
> >>> @@ -276,6 +281,8 @@ int __attribute_const__ kvm_target_cpu(void)
> >>>
> >>>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> >>>  {
> >>> +     /* Force users to call KVM_ARM_VCPU_INIT */
> >>> +     vcpu->arch.target = -1;
> >>>       return 0;
> >>>  }
> >>>
> >>> @@ -286,6 +293,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
> >>>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>>  {
> >>>       vcpu->cpu = cpu;
> >>> +     vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
> >>>  }
> >>>
> >>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >>> @@ -318,12 +326,189 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
> >>>
> >>>  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
> >> As far as I see the function is unused.
> >>
> >>>  {
> >>> +     return v->mode == IN_GUEST_MODE;
> >>> +}
> >>> +
> >>> +/* Just ensure a guest exit from a particular CPU */
> >>> +static void exit_vm_noop(void *info)
> >>> +{
> >>> +}
> >>> +
> >>> +void force_vm_exit(const cpumask_t *mask)
> >>> +{
> >>> +     smp_call_function_many(mask, exit_vm_noop, NULL, true);
> >>> +}
> >> There is make_all_cpus_request() for that. It actually sends IPIs only
> >> to cpus that are running vcpus.
> >>
> >>> +
> >>> +/**
> >>> + * need_new_vmid_gen - check that the VMID is still valid
> >>> + * @kvm: The VM's VMID to checkt
> >>> + *
> >>> + * return true if there is a new generation of VMIDs being used
> >>> + *
> >>> + * The hardware supports only 256 values with the value zero reserved for the
> >>> + * host, so we check if an assigned value belongs to a previous generation,
> >>> + * which which requires us to assign a new value. If we're the first to use a
> >>> + * VMID for the new generation, we must flush necessary caches and TLBs on all
> >>> + * CPUs.
> >>> + */
> >>> +static bool need_new_vmid_gen(struct kvm *kvm)
> >>> +{
> >>> +     return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
> >>> +}
> >>> +
> >>> +/**
> >>> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
> >>> + * @kvm      The guest that we are about to run
> >>> + *
> >>> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
> >>> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
> >>> + * caches and TLBs.
> >>> + */
> >>> +static void update_vttbr(struct kvm *kvm)
> >>> +{
> >>> +     phys_addr_t pgd_phys;
> >>> +     u64 vmid;
> >>> +
> >>> +     if (!need_new_vmid_gen(kvm))
> >>> +             return;
> >>> +
> >>> +     spin_lock(&kvm_vmid_lock);
> >>> +
> >>> +     /*
> >>> +      * We need to re-check the vmid_gen here to ensure that if another vcpu
> >>> +      * already allocated a valid vmid for this vm, then this vcpu should
> >>> +      * use the same vmid.
> >>> +      */
> >>> +     if (!need_new_vmid_gen(kvm)) {
> >>> +             spin_unlock(&kvm_vmid_lock);
> >>> +             return;
> >>> +     }
> >>> +
> >>> +     /* First user of a new VMID generation? */
> >>> +     if (unlikely(kvm_next_vmid == 0)) {
> >>> +             atomic64_inc(&kvm_vmid_gen);
> >>> +             kvm_next_vmid = 1;
> >>> +
> >>> +             /*
> >>> +              * On SMP we know no other CPUs can use this CPU's or each
> >>> +              * other's VMID after force_vm_exit returns since the
> >>> +              * kvm_vmid_lock blocks them from reentry to the guest.
> >>> +              */
> >>> +             force_vm_exit(cpu_all_mask);
> >>> +             /*
> >>> +              * Now broadcast TLB + ICACHE invalidation over the inner
> >>> +              * shareable domain to make sure all data structures are
> >>> +              * clean.
> >>> +              */
> >>> +             kvm_call_hyp(__kvm_flush_vm_context);
> >>> +     }
> >>> +
> >>> +     kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
> >>> +     kvm->arch.vmid = kvm_next_vmid;
> >>> +     kvm_next_vmid++;
> >>> +
> >>> +     /* update vttbr to be used with the new vmid */
> >>> +     pgd_phys = virt_to_phys(kvm->arch.pgd);
> >>> +     vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK;
> >>> +     kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK;
> >>> +     kvm->arch.vttbr |= vmid;
> >>> +
> >>> +     spin_unlock(&kvm_vmid_lock);
> >>> +}
> >>> +
> >>> +/*
> >>> + * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
> >>> + * proper exit to QEMU.
> >>> + */
> >>> +static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >>> +                    int exception_index)
> >>> +{
> >>> +     run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> >>>       return 0;
> >>>  }
> >>>
> >>> +/**
> >>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
> >>> + * @vcpu:    The VCPU pointer
> >>> + * @run:     The kvm_run structure pointer used for userspace state exchange
> >>> + *
> >>> + * This function is called through the VCPU_RUN ioctl called from user space. It
> >>> + * will execute VM code in a loop until the time slice for the process is used
> >>> + * or some emulation is needed from user space in which case the function will
> >>> + * return with return value 0 and with the kvm_run structure filled in with the
> >>> + * required data for the requested emulation.
> >>> + */
> >>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>  {
> >>> -     return -EINVAL;
> >>> +     int ret;
> >>> +     sigset_t sigsaved;
> >>> +
> >>> +     /* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
> >>> +     if (unlikely(vcpu->arch.target < 0))
> >>> +             return -ENOEXEC;
> >>> +
> >>> +     if (vcpu->sigset_active)
> >>> +             sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
> >>> +
> >>> +     ret = 1;
> >>> +     run->exit_reason = KVM_EXIT_UNKNOWN;
> >>> +     while (ret > 0) {
> >>> +             /*
> >>> +              * Check conditions before entering the guest
> >>> +              */
> >>> +             cond_resched();
> >>> +
> >>> +             update_vttbr(vcpu->kvm);
> >>> +
> >>> +             local_irq_disable();
> >>> +
> >>> +             /*
> >>> +              * Re-check atomic conditions
> >>> +              */
> >>> +             if (signal_pending(current)) {
> >>> +                     ret = -EINTR;
> >>> +                     run->exit_reason = KVM_EXIT_INTR;
> >>> +             }
> >>> +
> >>> +             if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >>> +                     local_irq_enable();
> >>> +                     continue;
> >>> +             }
> >>> +
> >>> +             /**************************************************************
> >>> +              * Enter the guest
> >>> +              */
> >>> +             trace_kvm_entry(*vcpu_pc(vcpu));
> >>> +             kvm_guest_enter();
> >>> +             vcpu->mode = IN_GUEST_MODE;
> >> You need to set mode to IN_GUEST_MODE before disabling interrupt and
> >> check that mode != EXITING_GUEST_MODE after disabling interrupt but
> >> before entering the guest. This way you will catch kicks that were sent
> >> between setting of the mode and disabling the interrupts. Also you need
> >> to check vcpu->requests and exit if it is not empty. I see that you do
> >> not use vcpu->requests at all, but you should since common kvm code
> >> assumes that it is used. make_all_cpus_request() uses it for instance.
> >>
> >
> > I don't quite agree, but almost:
> >
> > Why would you set IN_GUEST_MODE before disabling interrupts? The only
> > reason I can see for to be a requirement is to leverage an implicit
> > memory barrier. Receiving the IPI in this little window does nothing
> > (the smp_cross_call is a noop).
> >
> > Checking that mode != EXITING_GUEST_MODE is equally useless in my
> > opinion, as I read the requests code the only reason for this mode is
> > to avoid sending an IPI twice.
> >
> > Kicks sent between setting the mode and disabling the interrupts is
> > not the point, the point is to check the requests field (which we
> > don't use at all on ARM, and generic code also doesn't use on ARM)
> > after disabling interrupts, and after setting IN_GUEST_MODE.
> >
> > The patch below fixes your issues, and while I would push back on
> > anything else than direct bug fixes at this point, the current code is
> > semantically incorrect wrt. KVM vcpu requests, so it's worth a fix,
> > and the patch itself is trivial.
> >
> [...]
> 
> Actually, I take that back, the kvm_vcpu_block function does make a
> request, which we don't need to handle, so adding code that checks for
> features we don't support is useless at this point. Please ignore the
> patch I sent earlier.
> 
The archs which are not interested in unhalt request just clear it after
return from kvm_vcpu_block.

> Later on we can change some of the code to use the vcpu->features map
> if there's a real benefit, but right now the priority is to merge this
> code, so anything that's not a bugfix should not go in.
> 

Agree. Lets merge it and change later. The vcpu run loop is simple
enough at this point. The question of using vcpu->requests is not
the question of "real benefit" though, of course you can introduce your
own mechanism to pass requests to vcpus instead of using whatever kvm
provides you. But from maintenance and code share point of view this
is wrong thing to do. Looks at this code for instance:

        /* Kick out any which are still running. */
        kvm_for_each_vcpu(i, v, vcpu->kvm) {
                /* Guest could exit now, making cpu wrong. That's OK. */
                if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE) {
                        force_vm_exit(get_cpu_mask(v->cpu));
                }
        }

Why not make_all_cpus_request(vcpu->kvm, KVM_REQ_PAUSE)?

And I am not sure KVM_REQ_UNHALT is so useless to you in the first
place. kvm_vcpu_block() can return even when vcpu is not runnable (if
signal is pending). KVM_REQ_UNHALT is the way to check for that. Hmm
this is actually looks like a BUG in the current code.

--
			Gleb.