[PATCH v2 3/8] KVM: arm64: PMU: Preserve vCPU's PMCR_EL0.N value on vCPU reset

Fri Jan 20 10:53:02 PST 2023

Hi Oliver, Marc,

Thank you for the review!

On Fri, Jan 20, 2023 at 10:05 AM Oliver Upton <oliver.upton at linux.dev> wrote:
>
> Hey Marc,
>
> On Fri, Jan 20, 2023 at 12:12:32PM +0000, Marc Zyngier wrote:
> > On Fri, 20 Jan 2023 00:30:33 +0000, Oliver Upton <oliver.upton at linux.dev> wrote:
> > > I think we need to derive a sanitised value for PMCR_EL0.N, as I believe
> > > nothing in the architecture prevents implementers from gluing together
> > > cores with varying numbers of PMCs. We probably haven't noticed it yet
> > > since it would appear all Arm designs have had 6 PMCs.
> >
> > This brings back the question of late onlining. How do you cope with
> > with the onlining of such a CPU that has a smaller set of counters
> > than its online counterparts? This is at odds with the way the PMU
> > code works.
>
> You're absolutely right, any illusion we derived from the online set of
> CPUs could fall apart with a late onlining of a different core.
>
> > If you have a different set of counters, you are likely to have a
> > different PMU altogether:
> >
> > [    1.192606] hw perfevents: enabled with armv8_cortex_a57 PMU driver, 7 counters available
> > [    1.201254] hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available
> >
> > This isn't a broken system, but it has two set of cores which are
> > massively different, and two PMUs.
> >
> > This really should tie back to the PMU type we're counting on, and to
> > the set of CPUs that implements it. We already have some
> > infrastructure to check for the affinity of the PMU vs the CPU we're
> > running on, and this is already visible to userspace.
> >
> > Can't we just leave this responsibility to userspace?
>
> Believe me, I'm always a fan of offloading things to userspace :)
>
> If the VMM is privy to the details of the system it is on then the
> differing PMUs can be passed through to the guest w/ pinned vCPU
> threads. I just worry about the case of a naive VMM that assumes a
> homogenous system. I don't think I could entirely blame the VMM in this
> case either as we've gone to lengths to sanitise the feature set
> exposed to userspace.
>
> What happens when a vCPU gets scheduled on a core where the vPMU
> doesn't match? Ignoring other incongruences, it is not possible to
> virtualize more counters than are supported by the vPMU of the core.

I believe KVM_RUN will fail with KVM_EXIT_FAIL_ENTRY (Please see
the code that handles ON_UNSUPPORTED_CPU).

> Stopping short of any major hacks in the kernel to fudge around the
> problem, I believe we may need to provide better documentation of how
> heterogeneous CPUs are handled in KVM and what userspace can do about
> it.

Documentation/virt/kvm/devices/vcpu.rstDocumentation/virt/kvm/devices/vcpu.rst
for KVM_ARM_VCPU_PMU_V3_SET_PMU
has some description for the current behavior at least.
(perhaps we may need to update documents for this though)

Now I'm a bit worried about the validation code for PMCR_EL0.N
as well, as setting (restoring) PMCR_EL0 could be done on any
pCPUs (even before using KVM_ARM_VCPU_PMU_V3_SET_PMU).

What I am currently looking at is something like this:
 - Set the sanitised (min) value of PMCR_EL0.N among all PMUs
   for vCPUs by default.
 - Validate the PMCR_EL0.N value that userspace tries to set
   against the max value on the system (this is to ensure that
   restoring PMCR_EL0 for a vCPU works on any pCPUs)
 - Make KVM_RUN fail when PMCR_EL0.N for the vCPU indicates
   more counters than the PMU that is set for the vCPU.

What do you think ?

Thank you,
Reiji