[bug report] KVM: arm64: vgic-its: Performance degradation on GICv3 LPI injection

Marc Zyngier maz at kernel.org
Thu Oct 24 01:00:56 PDT 2024


On Thu, 24 Oct 2024 06:06:58 +0100,
Zhiqiang Ni <nizhiqiang1 at huawei.com> wrote:
> 
> Hi all,
> 
> I found a performance degradation on GICv3 LPI injection after this
> commit ad362fe07fecf0aba839ff2cc59a3617bd42c33f(KVM: arm64: vgic-its:
> Avoid potential UAF in LPI translation cache).
>
> In my testcase, the vm's configuration is 60 VCPU 120G RAM with a
> 32-queue NIC and the kernel version is 5.10. The number of new TCP
> connections per second changed from 150,000 to 50,000 after this
> patch, with the %sys of cpu changed from 15% to 85%.

What is the VM running? How is the traffic generated? Without a
reproducer, I struggle to see how we are going to analyse this issue.

We can't go back to the previous situation anyway, as it has been
shown that what we had before was simply unsafe (the commit message
explains why).

> From the ftrace, I found that the duration of vgic_put_irq() is
> 13.320 us, which may be the reason for the performance degradation.
>
> The call stack looks like below:
>     kvm_arch_set_irq_inatomic()
>       vgic_has_its();
>       vgic_its_inject_cached_translation()
>         vgic_its_check_cache()
>         vgic_queue_irq_unlock()
>         vgic_put_irq()

Are you suggesting that it is the combination of vgic_get_irq_kref() +
vgic_irq_put() that leads to excessive latency? Both are essentially
atomic operations, which should be pretty cheap on a modern CPU
(anything with FEAT_LSE).

The patch below indicates that you are looking at a rather old kernel
(6.8). What is the result on a more recent kernel (from 6.10)?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list