[PATCH v1 2/2] KVM: arm64: PMU: Ensure to trap PMU access from EL0 to EL2
Marc Zyngier
maz at kernel.org
Wed Mar 29 05:03:18 PDT 2023
On Wed, 29 Mar 2023 01:21:36 +0100,
Reiji Watanabe <reijiw at google.com> wrote:
>
> Currently, with VHE, KVM sets ER, CR, SW and EN bits of
> PMUSERENR_EL0 to 1 on vcpu_load(). So, if the value of those bits
> are cleared after vcpu_load() (the perf subsystem would do when PMU
> counters are programmed for the guest), PMU access from the guest EL0
> might be trapped to the guest EL1 directly regardless of the current
> PMUSERENR_EL0 value of the vCPU.
+ RobH.
Is that what is done when the event is created and armv8pmu_start()
called? This is... crap. The EL0 access thing breaks everything, and
nobody tested it with KVM, obviously.
I would be tempted to start mitigating it with the following:
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index dde06c0f97f3..8063525bf3dd 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -806,17 +806,19 @@ static void armv8pmu_disable_event(struct perf_event *event)
static void armv8pmu_start(struct arm_pmu *cpu_pmu)
{
- struct perf_event_context *ctx;
- int nr_user = 0;
+ if (sysctl_perf_user_access) {
+ struct perf_event_context *ctx;
+ int nr_user = 0;
- ctx = perf_cpu_task_ctx();
- if (ctx)
- nr_user = ctx->nr_user;
+ ctx = perf_cpu_task_ctx();
+ if (ctx)
+ nr_user = ctx->nr_user;
- if (sysctl_perf_user_access && nr_user)
- armv8pmu_enable_user_access(cpu_pmu);
- else
- armv8pmu_disable_user_access();
+ if (nr_user)
+ armv8pmu_enable_user_access(cpu_pmu);
+ else
+ armv8pmu_disable_user_access();
+ }
/* Enable all counters */
armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
but that's obviously not enough as we want it to work with EL0 access
enabled on the host as well.
What we miss is something that tells the PMU code "we're in a context
where host userspace isn't present", and this would be completely
skipped, relying on KVM to restore the appropriate state on
vcpu_put(). But then the IPI stuff that controls EL0 can always come
in and wreck things. Gahhh...
I'm a bit reluctant to use the "save/restore all the time" hammer,
because it only hides that the EL0 counter infrastructure is a bit
broken.
> With VHE, fix this by setting those bits of the register on every
> guest entry (as with nVHE). Also, opportunistically make the similar
> change for PMSELR_EL0, which is cleared by vcpu_load(), to ensure it
> is always set to zero on guest entry (PMXEVCNTR_EL0 access might cause
> UNDEF at EL1 instead of being trapped to EL2, depending on the value
> of PMSELR_EL0). I think that would be more robust, although I don't
> find any kernel code that writes PMSELR_EL0.
This was changed a while ago to avoid using the selection register,
see 0fdf1bb75953 ("arm64: perf: Avoid PMXEV* indirection"), and the
rationale behind the reset of PMSELR_EL0 in 21cbe3cc8a48 ("arm64: KVM:
pmu: Reset PMSELR_EL0.SEL to a sane value before entering the guest").
We *could* simply drop this zeroing of PMSELR_EL0 now that there is
nothing else host-side that writes to it. But we need to agree on how
to fix the above first.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list