[RFC] KVM: arm64: Don't force broadcast tlbi when guest is running

Marc Zyngier maz at kernel.org
Fri Oct 23 05:17:25 EDT 2020


On 2020-10-23 02:26, Shaokun Zhang wrote:
> Hi Marc,
> 
> 在 2020/10/22 20:03, Marc Zyngier 写道:
>> On 2020-10-22 02:57, Shaokun Zhang wrote:
>>> From: Nianyao Tang <tangnianyao at huawei.com>
>>> 
>>> Now HCR_EL2.FB is set to force broadcast tlbi to all online pcpus,
>>> even those vcpu do not resident on. It would get worse as system
>>> gets larger and more pcpus get online.
>>> Let's disable force-broadcast. We flush tlbi when move vcpu to a
>>> new pcpu, in case old page mapping still exists on new pcpu.
>>> 
>>> Cc: Marc Zyngier <maz at kernel.org>
>>> Signed-off-by: Nianyao Tang <tangnianyao at huawei.com>
>>> Signed-off-by: Shaokun Zhang <zhangshaokun at hisilicon.com>
>>> ---
>>>  arch/arm64/include/asm/kvm_arm.h | 2 +-
>>>  arch/arm64/kvm/arm.c             | 4 +++-
>>>  2 files changed, 4 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/arch/arm64/include/asm/kvm_arm.h 
>>> b/arch/arm64/include/asm/kvm_arm.h
>>> index 64ce29378467..f85ea9c649cb 100644
>>> --- a/arch/arm64/include/asm/kvm_arm.h
>>> +++ b/arch/arm64/include/asm/kvm_arm.h
>>> @@ -75,7 +75,7 @@
>>>   * PTW:        Take a stage2 fault if a stage1 walk steps in device 
>>> memory
>>>   */
>>>  #define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | 
>>> HCR_VM | \
>>> -             HCR_BSU_IS | HCR_FB | HCR_TAC | \
>>> +             HCR_BSU_IS | HCR_TAC | \
>>>               HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
>>>               HCR_FMO | HCR_IMO | HCR_PTW )
>>>  #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>>> index acf9a993dfb6..845be911f885 100644
>>> --- a/arch/arm64/kvm/arm.c
>>> +++ b/arch/arm64/kvm/arm.c
>>> @@ -334,8 +334,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, 
>>> int cpu)
>>>      /*
>>>       * We might get preempted before the vCPU actually runs, but
>>>       * over-invalidation doesn't affect correctness.
>>> +     * Dirty tlb might still exist when vcpu ran on other pcpu
>>> +     * and modified page mapping.
>>>       */
>>> -    if (*last_ran != vcpu->vcpu_id) {
>>> +    if (*last_ran != vcpu->vcpu_id || vcpu->cpu != cpu) {
>>>          kvm_call_hyp(__kvm_tlb_flush_local_vmid, mmu);
>>>          *last_ran = vcpu->vcpu_id;
>>>      }
>> 
>> This breaks uniprocessor semantics for I-cache invalidation. What 
>> could
> 
> Oops, It shall consider this.
> 
>> possibly go wrong? You also fail to provide any data that would back 
>> up
>> your claim, as usual.
> 
> Test Unixbench spawn in each 4U8G vm on Kunpeng 920:
> (1) 24 4U8G VMs, each vcpu is taskset to one pcpu to make sure
> each vm seperated.

That's a very narrow use case.

> (2) 1 4U8G VM
> Result:
> (1) 800 * 24
> (2) 3200

I don't know what these numbers are.

> By restricting DVM broadcasting across NUMA, result (1) can
> be improved to 2300 * 24. In spawn testcase, tlbiis used
> in creating process.

Linux always use TLBI IS, except in some very rare cases.
If your HW broadcasts invalidations across the whole system
without any filtering, it is bound to be bad.

> Further, we consider restricting tlbi broadcast range and make
> tlbi more accurate.
> One scheme is replacing tlbiis with ipi + tlbi + HCR_EL2.FB=0.

Ah. That old thing again. No, thank you. The right thing to do
is to build CPUs and interconnects that properly deals with
IS invalidations by not sending DVM messages to places where
things don't run (snoop filters?). I also doubt the arm64
maintainers would be sympathetic to the idea anyway.

> Consider I-cache invalidation, KVM also needs to invalid icache
> when moving vcpu to a new pcpu.
> Do we miss any cases that need HCR_EL2.FB == 1?
> Eventually we expect HCR_EL2.FB == 0 if it is possible.

I have no intention to ever set FB to 0, as this would result
in over-invalidation in the general case, and worse performance
on well designed systems.

         M.
-- 
Jazz is not dead. It just smells funny...



More information about the linux-arm-kernel mailing list