[PATCH 1/2] ARM: KVM: Yield CPU when vcpu executes a WFE

Raghavendra KT raghavendra.kt.linux at gmail.com
Tue Oct 8 07:26:11 EDT 2013


On Mon, Oct 7, 2013 at 9:10 PM, Marc Zyngier <marc.zyngier at arm.com> wrote:
> On an (even slightly) oversubscribed system, spinlocks are quickly
> becoming a bottleneck, as some vcpus are spinning, waiting for a
> lock to be released, while the vcpu holding the lock may not be
> running at all.
>
> This creates contention, and the observed slowdown is 40x for
> hackbench. No, this isn't a typo.
>
> The solution is to trap blocking WFEs and tell KVM that we're
> now spinning. This ensures that other vpus will get a scheduling
> boost, allowing the lock to be released more quickly.
>
> From a performance point of view: hackbench 1 process 1000
>
> 2xA15 host (baseline):  1.843s
>
> 2xA15 guest w/o patch:  2.083s
> 4xA15 guest w/o patch:  80.212s
>
> 2xA15 guest w/ patch:   2.072s
> 4xA15 guest w/ patch:   3.202s
>
> So we go from a 40x degradation to 1.5x, which is vaguely more
> acceptable.
>
> Signed-off-by: Marc Zyngier <marc.zyngier at arm.com>
> ---
>  arch/arm/include/asm/kvm_arm.h | 4 +++-
>  arch/arm/kvm/handle_exit.c     | 6 +++++-
>  2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index 64e9696..693d5b2 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -67,7 +67,7 @@
>   */
>  #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
>                         HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
> -                       HCR_SWIO | HCR_TIDCP)
> +                       HCR_TWE | HCR_SWIO | HCR_TIDCP)
>  #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
>
>  /* System Control Register (SCTLR) bits */
> @@ -208,6 +208,8 @@
>  #define HSR_EC_DABT    (0x24)
>  #define HSR_EC_DABT_HYP        (0x25)
>
> +#define HSR_WFI_IS_WFE         (1U << 0)
> +
>  #define HSR_HVC_IMM_MASK       ((1UL << 16) - 1)
>
>  #define HSR_DABT_S1PTW         (1U << 7)
> diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c
> index df4c82d..c4c496f 100644
> --- a/arch/arm/kvm/handle_exit.c
> +++ b/arch/arm/kvm/handle_exit.c
> @@ -84,7 +84,11 @@ static int handle_dabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  static int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
>         trace_kvm_wfi(*vcpu_pc(vcpu));
> -       kvm_vcpu_block(vcpu);
> +       if (kvm_vcpu_get_hsr(vcpu) & HSR_WFI_IS_WFE)
> +               kvm_vcpu_on_spin(vcpu);

Could you also enable CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT for arm and
check if ple handler logic helps further?
we would ideally get one more optimization folded into ple handler if
you enable that.



More information about the linux-arm-kernel mailing list