REGRESSION: Upgrading host kernel from 5.11 to 5.13 breaks QEMU guests - perf/fw_devlink/kvm
Marc Zyngier
maz at kernel.org
Sun Sep 19 06:36:46 PDT 2021
Hi Russell,
Thanks for the heads-up.
On Sat, 18 Sep 2021 17:17:45 +0100,
"Russell King (Oracle)" <linux at armlinux.org.uk> wrote:
>
> Hi,
>
> This morning, I upgraded my VM host from Debian Buster to Debian
> Bullseye. This host (Armada 8040) runs libvirt. I placed all the VMs
> into managedsave, which basically means their state is saved out by
> QEMU ready to be resumed once the upgrade is complete.
>
> Initially, I was running 5.11 on the host, which didn't have POSIX
> ACLs enabled on tmpfs. Sadly, upgrading to Bullseye now requires
> this to be enabled, so I upgraded the kernel to 5.13 to avoid this
> problem - without POSIX ACLs on tmpfs, qemu refuses to even start.
>
> Under Bullseye with 5.13, I could start new VMs, but I could not
> resume the saved VMs - it would spit out:
>
> 2021-09-18T11:14:14.456227Z qemu-system-aarch64: Invalid value 236 expecting positive value <= 162
> 2021-09-18T11:14:14.456269Z qemu-system-aarch64: Failed to load cpu:cpreg_vmstate_array_len
> 2021-09-18T11:14:14.456279Z qemu-system-aarch64: error while loading state for instance 0x0 of device 'cpu'
> 2021-09-18T11:14:14.456887Z qemu-system-aarch64: load of migration failed: Invalid argument
>
> Essentially, what this is complaining about is that the register
> list returned by the KVM_GET_REG_LIST ioctl has reduced in size,
> meaning that the saved VM has more registers that need to be set
> (236 of them, after QEMU's filtering) than those which are actually
> available under 5.13 (162 after QEMU's filtering).
>
> After much debugging, and writing a program to create a VM and CPU,
> and retrieve the register list etc, I have finally tracked down
> exactly what is going on...
>
> Under 5.13, KVM believes there is no PMU, so it doesn't advertise the
> PMU registers, despite the hardware having a PMU and the kernel
> having support.
>
> kvm_arm_support_pmu_v3() determines whether the PMU_V3 feature is
> available or not.
>
> Under 5.11, this was determined via:
>
> perf_num_counters() > 0
>
> which in turn was derived from (essentially):
>
> __oprofile_cpu_pmu && __oprofile_cpu_pmu->num_events > 0
>
> __oprofile_cpu_pmu can be set at any time when the PMU has been
> initialised, and due to initialisation ordering, it appears to happen
> much later in the kernel boot.
>
> However, under 5.13, oprofile has been removed, and this test is no
> longer present. Instead, kvm attempts to determine the availability
> of PMUv3 when it initialises, and fails to because the PMU has not
> yet been initialised. If there is no PMU at the point KVM initialises,
> then KVM will never advertise a PMU.
>
> This change of behaviour is what breaks KVM on Armada 8040, causing
> a userspace regression.
>
> What makes this more confusing is the PMU errors have gone:
>
> 5.13:
> [ 0.170514] PCI: CLS 0 bytes, default 64
> [ 0.171085] kvm [1]: IPA Size Limit: 44 bits
> [ 0.172532] kvm [1]: vgic interrupt IRQ9
> [ 0.172650] kvm: pmu event creation failed -2
> [ 0.172688] kvm [1]: Hyp mode initialized successfully
> ...
> [ 1.479833] hw perfevents: enabled with armv8_cortex_a72 PMU driver, 7 counters available
>
[...]
Urgh. That's a bummer. T1he PMU driver only comes up once it has found
its interrupt controller, which on the Armada 8040 is not the GIC, but
some weird thing on the side that doesn't actually serve any real
purpose. On HW where the PMU is directly wired into the GIC, it all
works fine, though by luck rather than by design.
Anyway, rant over. This is a bug that needs addressing so that KVM can
initialise correctly irrespective of the probing order. This probably
means that the static key controlling KVM's behaviour wrt the PMU must
be controlled by the PMU infrastructure itself, rather than KVM trying
to probe for it.
Can you please give the following hack a go (on top of 5.15-rc1)? I've
briefly tested it on my McBin, and it did the trick. I've also tested
it on the M1 (which really doesn't have an architectural PMU) to
verify that it was correctly failing.
Thanks,
M.
From 9c26e3e6bbcbc3a583b3974e7a9017029d31fe29 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz at kernel.org>
Date: Sun, 19 Sep 2021 14:09:49 +0100
Subject: [PATCH] KVM: arm64: Fix PMU probe ordering
Russell reported that since 5.13, KVM's probing of the PMU has
started to fail on his HW. As it turns out, there is an implicit
ordering dependency between the architectural PMU probing code and
and KVM's own probing. If, due to probe ordering reasons, KVM probes
before the PMU driver, it will fail to detect the PMU and prevent it
from being advertised to guests as well as the VMM.
Obviously, this is one probing too many, and we should be able to
deal with any ordering.
Add a callback from the PMU code into KVM to advertise the registration
of a host CPU PMU, allowing for any probing order.
Fixes: 5421db1be3b1 ("KVM: arm64: Divorce the perf code from oprofile helpers")
Reported-by: "Russell King (Oracle)" <linux at armlinux.org.uk>
Signed-off-by: Marc Zyngier <maz at kernel.org>
Link: https://lore.kernel.org/r/YUYRKVflRtUytzy5@shell.armlinux.org.uk
Cc: stable at vger.kernel.org
---
arch/arm64/kvm/perf.c | 3 ---
arch/arm64/kvm/pmu-emul.c | 12 +++++++++++-
drivers/perf/arm_pmu.c | 2 ++
include/kvm/arm_pmu.h | 3 ---
include/linux/perf/arm_pmu.h | 6 ++++++
5 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kvm/perf.c b/arch/arm64/kvm/perf.c
index f9bb3b14130e..c84fe24b2ea1 100644
--- a/arch/arm64/kvm/perf.c
+++ b/arch/arm64/kvm/perf.c
@@ -50,9 +50,6 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = {
int kvm_perf_init(void)
{
- if (kvm_pmu_probe_pmuver() != ID_AA64DFR0_PMUVER_IMP_DEF && !is_protected_kvm_enabled())
- static_branch_enable(&kvm_arm_pmu_available);
-
return perf_register_guest_info_callbacks(&kvm_guest_cbs);
}
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index f5065f23b413..588100c52f34 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -740,7 +740,17 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
kvm_pmu_create_perf_event(vcpu, select_idx);
}
-int kvm_pmu_probe_pmuver(void)
+void kvm_host_pmu_init(struct arm_pmu *pmu)
+{
+ if (pmu->pmuver != 0 &&
+ pmu->pmuver != ID_AA64DFR0_PMUVER_IMP_DEF &&
+ !is_protected_kvm_enabled()) {
+ static_branch_enable(&kvm_arm_pmu_available);
+ kvm_info("PMU detected and enabled\n");
+ }
+}
+
+static int kvm_pmu_probe_pmuver(void)
{
struct perf_event_attr attr = { };
struct perf_event *event;
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 3cbc3baf087f..295cc7952d0e 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -952,6 +952,8 @@ int armpmu_register(struct arm_pmu *pmu)
pmu->name, pmu->num_events,
has_nmi ? ", using NMIs" : "");
+ kvm_host_pmu_init(pmu);
+
return 0;
out_destroy:
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 864b9997efb2..90f21898aad8 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -61,7 +61,6 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu,
int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr);
int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu);
-int kvm_pmu_probe_pmuver(void);
#else
struct kvm_pmu {
};
@@ -118,8 +117,6 @@ static inline u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1)
return 0;
}
-static inline int kvm_pmu_probe_pmuver(void) { return 0xf; }
-
#endif
#endif
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 505480217cf1..2512e2f9cd4e 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -163,6 +163,12 @@ int arm_pmu_acpi_probe(armpmu_init_fn init_fn);
static inline int arm_pmu_acpi_probe(armpmu_init_fn init_fn) { return 0; }
#endif
+#ifdef CONFIG_KVM
+void kvm_host_pmu_init(struct arm_pmu *pmu);
+#else
+#define kvm_host_pmu_init(x) do { } while(0)
+#endif
+
/* Internal functions only for core arm_pmu code */
struct arm_pmu *armpmu_alloc(void);
struct arm_pmu *armpmu_alloc_atomic(void);
--
2.30.2
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list