[PATCH v7 3/4] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY
Akihiko Odaki
odaki at rsg.ci.i.u-tokyo.ac.jp
Mon Apr 20 05:07:15 PDT 2026
On 2026/04/20 18:51, Marc Zyngier wrote:
> On Mon, 20 Apr 2026 09:36:16 +0100,
> Akihiko Odaki <odaki at rsg.ci.i.u-tokyo.ac.jp> wrote:
>>
>> On 2026/04/20 2:19, Marc Zyngier wrote:
>>> On Sat, 18 Apr 2026 09:14:25 +0100,
>>> Akihiko Odaki <odaki at rsg.ci.i.u-tokyo.ac.jp> wrote:
>>>>
>>>> On a heterogeneous arm64 system, KVM's PMU emulation is based on the
>>>> features of a single host PMU instance. When a vCPU is migrated to a
>>>> pCPU with an incompatible PMU, counters such as PMCCNTR_EL0 stop
>>>> incrementing.
>>>>
>>>> Although this behavior is permitted by the architecture, Windows does
>>>> not handle it gracefully and may crash with a division-by-zero error.
>>>>
>>>> The current workaround requires VMMs to pin vCPUs to a set of pCPUs
>>>> that share a compatible PMU. This is difficult to implement correctly in
>>>> QEMU/libvirt, where pinning occurs after vCPU initialization, and it
>>>> also restricts the guest to a subset of available pCPUs.
>>>>
>>>> Introduce the KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY attribute to
>>>> create a "fixed-counters-only" PMU. When set, KVM exposes a PMU that is
>>>> compatible with all pCPUs but that does not support programmable
>>>> event counters which may have different feature sets on different PMUs.
>>>>
>>>> This allows Windows guests to run reliably on heterogeneous systems
>>>> without crashing, even without vCPU pinning, and enables VMMs to
>>>> schedule vCPUs across all available pCPUs, making full use of the host
>>>> hardware.
>>>>
>>>> Much like KVM_ARM_VCPU_PMU_V3_IRQ and other read-write attributes, this
>>>> attribute provides a getter that facilitates kernel and userspace
>>>> debugging/testing.
>>>
>>> OK, so that's the sales pitch. But how is it implemented? I would like
>>> to be able to read a high-level description of the implementation
>>> trade-offs.
>>
>> Implementation-wise it is very trivial. Essentially the following
>> addition in kvm_arm_pmu_v3_get_attr() is the entire implementation:
>> + case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
>> + if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
>> &vcpu->kvm->arch.flags))
>> + return 0;
>>
>> Both its functionality and code complexity is trivial. So we can argue that:
>> - the functionality is too trivial to be useful or
>> - the interface/implementation complexity is so trivial that it does not
>> incur maintenance burden
>>
>> In this case the selftest uses the getter so I was more inclined to
>> have it, but adding one just for the selftest sounds too ad-hoc, so
>> here I looked into other attributes to ensure that it was not
>> introducing inconsistency with existing interfaces.
>>
>> As the result, I found there are other read-write attributes; in fact
>> there are more read-write attributes than write-only ones.
>
> You're completely missing the point. I'm referring to the whole of the
> commit message, which is more of a marketing slide than a technical
> description.
In terms of implementation, the obvious tradeoff is that it adds more
code to implement the feature. One thing to note is that
kvm_vcpu_load_pmu() is added and is called each time a vCPU migrates
across pCPUs. The heavy part, making the KVM_REQ_RELOAD_PMU request,
only happens when the feature is enabled.
>
> I really don't care about the getter at this stage, which while
> pointless, does not make things more awful than they already are.
>
>>
>>>
>>>>
>>>> Signed-off-by: Akihiko Odaki <odaki at rsg.ci.i.u-tokyo.ac.jp>
>>>> ---
>>>> Documentation/virt/kvm/devices/vcpu.rst | 29 ++++++
>>>> arch/arm64/include/asm/kvm_host.h | 2 +
>>>> arch/arm64/include/uapi/asm/kvm.h | 1 +
>>>> arch/arm64/kvm/arm.c | 1 +
>>>> arch/arm64/kvm/pmu-emul.c | 155 +++++++++++++++++++++++---------
>>>> include/kvm/arm_pmu.h | 2 +
>>>> 6 files changed, 147 insertions(+), 43 deletions(-)
>>>>
>>>> diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
>>>> index 60bf205cb373..e0aeb1897d77 100644
>>>> --- a/Documentation/virt/kvm/devices/vcpu.rst
>>>> +++ b/Documentation/virt/kvm/devices/vcpu.rst
>>>> @@ -161,6 +161,35 @@ explicitly selected, or the number of counters is out of range for the
>>>> selected PMU. Selecting a new PMU cancels the effect of setting this
>>>> attribute.
>>>> +1.6 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
>>>> +------------------------------------------------------
>>>> +
>>>> +:Parameters: no additional parameter in kvm_device_attr.addr
>>>> +
>>>> +:Returns:
>>>> +
>>>> + ======= =====================================================
>>>> + -EBUSY Attempted to set after initializing PMUv3 or running
>>>> + VCPU, or attempted to set for the first time after
>>>> + setting an event filter
>>>> + -ENXIO Attempted to get before setting
>>>> + -ENODEV Attempted to set while PMUv3 not supported
>>>> + ======= =====================================================
>>>> +
>>>> +If set, PMUv3 will be emulated without programmable event counters. The VCPU
>>>> +will use any compatible hardware PMU. This attribute is particularly useful on
>>>
>>> Not quite "any PMU". It will use *the* PMU of the physical CPU,
>>> irrespective of the implementation.
>>
>> I think:
>>
>> - this comment
>> - one on the KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED note
>> - one on kvm_pmu_create_perf_event()
>> - and one on kvm_arm_pmu_v3_set_pmu_fixed_counters_only()
>>
>> All boil down into one question: will it support all possible CPUs, or
>> will it support a subset? Let me answer here:
>>
>> This patch is written to support a subset instead of all possible
>> CPUs. If a pCPU does not have a compatible PMU, the pCPU will not be
>> supported and cause KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED.
>
> This is not a thing. Either *all* the CPUs have a PMU that can be used
> for KVM, or PMU support is not offered to guests. That's a hard line
> in the sand. And the code already upholds this by checking the
> sanitised PMUVer field.
>
>>
>> This patch does not enforce all possible CPUs are covered by the
>> compatible PMUs. Theoretically speaking,
>> kvm_arm_pmu_get_pmuver_limit() enables the PMU emulation when real
>> PMUv3 hardware covers all possible CPUs *or* the relevant registers
>> can be trapped with IMPDEF, so some pCPU may not have a compatible PMU
>> and only provide the IMPDEF trapping.
>
> How is that possible? Please describe the case where that can happen,
> and I will make sure that such a system stops booting. The intent is
> definitely that that:
>
> - for early CPUs, we take the minimal capability of all CPUs
>
> - for late CPUs, either they match at least the capability recorded by
> early CPUs, or they don't boot.
All CPUs may trap the relevant registers with IMPDEF but some of them
may not have compatible PMUs. As I wrote in the previous email, I don't
think it will happen in practice.
>
>> Practically, I don't think any sane configuration will ever have such
>> a subset support, so we can explicitly enforce all possible CPUs are
>> covered by the compatible PMUs if desired.
>
> That's not just desired. This is a requirement. And it is already
> enforced AFAICS.
>
>>
>>>
>>>> +heterogeneous systems where different hardware PMUs cover different physical
>>>> +CPUs. The compatibility of hardware PMUs can be checked with
>>>> +KVM_ARM_VCPU_PMU_V3_SET_PMU. All VCPUs in a VM share this attribute. It isn't
>>>> +possible to set it for the first time if a PMU event filter is already present.
>>>
>>> "for the first time" gives the impression that it will work if you try
>>> again. I'd rather we say that "This feature is incompatible with the
>>> existence of a PMU event filter".
>>
>> The following sequence will work:
>> 1. Set KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
>> 2. Set KVM_ARM_VCPU_PMU_V3_FILTER
>> 3. Set KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
>>
>> This is to make the behavior conistent with KVM_ARM_VCPU_PMU_V3_SET_PMU.
>
> I don't think this is correct. Filtering is completely at odds with
> this patch, and I don't want to have to reason about the combination.
kvm_arm_pmu_v3_set_pmu() has the following condition:
if (kvm_vm_has_ran_once(kvm) ||
(kvm->arch.pmu_filter && kvm->arch.arm_pmu != arm_pmu)) {
ret = -EBUSY;
break;
}
kvm_arm_pmu_v3_set_pmu_fixed_counters_only() has the corresponding
condition for consistency:
if (kvm_vm_has_ran_once(kvm) ||
(kvm->arch.pmu_filter &&
!test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
&kvm->arch.flags)))
return -EBUSY;
We can of course kill the PMU event filter for FIXED_COUNTERS_ONLY. The
filter is effectively no-op with FIXED_COUNTERS_ONLY and I don't think
that consistency matters much.
>
> [...]
>
>>>> + int i;
>>>> +
>>>> + for_each_set_bit(i, &mask, 32) {
>>>> + pmc = kvm_vcpu_idx_to_pmc(vcpu, i);
>>>> + if (!pmc->perf_event)
>>>> + continue;
>>>> +
>>>> + cpu_pmu = to_arm_pmu(pmc->perf_event->pmu);
>>>> + if (!cpumask_test_cpu(vcpu->cpu, &cpu_pmu->supported_cpus)) {
>>>> + kvm_make_request(KVM_REQ_RELOAD_PMU, vcpu);
>>>> + break;
>>>> + }
>>>> + }
>>>> +}
>>>> +
>>>
>>> Why do we need to inflict this on VMs that do not have the fixed
>>> counter restriction?
>>
>> This function is to re-create the perf_event in case the current
>> perf_event does not support the pCPU because e.g., the pCPU is a
>> E-core while the perf_event only covers the P-cores.
>
> That's not what I meant. This code is only here to support the
> fixed-function feature. It makes no sense outside of it, because *we
> don't support counter migration across implementations*.
>
> So what's the purpose of this stuff for the normal KVM setup?
None. It's only for this feature. We can add a check of the feature flag
at the beginning of the function to avoid that loop.
>
>>
>>>
>>> And even then, all you have to reconfigure is the cycle counter. So
>>> why the loop? All we want to find out is whether the cycle counter is
>>> instantiated on the PMU that matches the current CPU.
>>
>> I just wanted to avoid hardcoding assumptions on the fixed
>> counter(s). FEAT_PMUv3_ICNTR will be naturaly handled with a loop, for
>> example.
>
> Well, not that loop, since ICNTR is counter 32. So please let's stop
> the nonsense and only add what is required?
>
> [...]
>
>>>> +
>>>> clear_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
>>>> &kvm->arch.flags);
>>>
>>> Why does this need to be cleared? I'd rather we make sure it is never
>>> set the first place.
>>
>> KVM_ARM_VCPU_PMU_V3_SET_PMU and
>> KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY can be set on the same
>> VCPU. The last KVM_ARM_VCPU_PMU_V3_SET_PMU or
>> KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY setting will be effective.
>>
>> A VMM may try set these attributes to check if the setting is
>> supported. For example, the RFC QEMU patch first uses
>> KVM_ARM_VCPU_PMU_V3_SET_PMU to find a compatible PMU that covers all
>> pCPUs, and then falls back to
>> KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY. The order of such probing is
>> up to the VMM.
>
> KVM_ARM_VCPU_PMU_V3_SET_PMU is not a probing mechanism. You must probe
> the PMUs by looking in /sys/bus/event_source/devices/, like kvmtool
> does.
>
> So there is no reason to support this stuff, and the two flags should
> be made mutually exclusive.
Thanks for the pointer. I'll make a change to make the flags mutually
exclusive and test it with an amended QEMU patch that follows what
kvmtool does.
>
> [...]
>
>>>>
>>>
>>> In conclusion, I find this patch to be rather messy. For a start, it
>>> needs to be split in at least 5 patches:
>>>
>>> - at least two for the refactoring
>>> - one for the PMU core changes
>>> - one for the UAPI
>>> - one for documentation
>>
>> That clarifies the expected granurarity of patches. The next version
>> will be in that layout, perhaps with more patches if an additional
>> change. Thanks for the guidance.
>>
>>>
>>> I'd also like some clarification on how this is intended to work if we
>>> enable FEAT_PMUv3_ICNTR, because the definition seems to be designed
>>> to encompass all fixed-function counters, and I expect this to grow
>>> over time.
>>
>> Indeed the UAPI was designed to encompass all fixed-function counters
>> as suggested by Oliver.
>>
>> To support the UAPI, the implementation avoids hardcoding the
>> assumption on the fixed counter(s). FEAT_PMUv3_INCTR will be naturaly
>> supported once the common code is properly updated (i.e., the size of
>> the event counter bitmask is grown the corresponding registers are
>> wired up with a proper check of the feature.)
>>
>> I expect migration will be handled with the conventional register
>> getters and setters, but please share if you have a concern.
>
> At the very least I want to see some documentation explaining that.
What kind of documentation do you expect? If we change
kvm_vcpu_load_pmu() to avoid for_each_set_bit(), there would be a good
chance to forget updating it when mechanically updating existing
for_each_set_bit() instances, so it is a candidate for documentation.
But I don't have a good idea where to place it either.
Regards,
Akihiko Odaki
More information about the linux-arm-kernel
mailing list