[PATCH v7 3/4] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY
Akihiko Odaki
odaki at rsg.ci.i.u-tokyo.ac.jp
Wed Apr 22 00:02:05 PDT 2026
On 2026/04/20 22:53, Marc Zyngier wrote:
> On Mon, 20 Apr 2026 13:07:15 +0100,
> Akihiko Odaki <odaki at rsg.ci.i.u-tokyo.ac.jp> wrote:
>>
>> On 2026/04/20 18:51, Marc Zyngier wrote:
>>> On Mon, 20 Apr 2026 09:36:16 +0100,
>>> Akihiko Odaki <odaki at rsg.ci.i.u-tokyo.ac.jp> wrote:
>>>>
>>>> On 2026/04/20 2:19, Marc Zyngier wrote:
>>>>> On Sat, 18 Apr 2026 09:14:25 +0100,
>>>>> Akihiko Odaki <odaki at rsg.ci.i.u-tokyo.ac.jp> wrote:
>>>>>>
>>>>>> On a heterogeneous arm64 system, KVM's PMU emulation is based on the
>>>>>> features of a single host PMU instance. When a vCPU is migrated to a
>>>>>> pCPU with an incompatible PMU, counters such as PMCCNTR_EL0 stop
>>>>>> incrementing.
>>>>>>
>>>>>> Although this behavior is permitted by the architecture, Windows does
>>>>>> not handle it gracefully and may crash with a division-by-zero error.
>>>>>>
>>>>>> The current workaround requires VMMs to pin vCPUs to a set of pCPUs
>>>>>> that share a compatible PMU. This is difficult to implement correctly in
>>>>>> QEMU/libvirt, where pinning occurs after vCPU initialization, and it
>>>>>> also restricts the guest to a subset of available pCPUs.
>>>>>>
>>>>>> Introduce the KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY attribute to
>>>>>> create a "fixed-counters-only" PMU. When set, KVM exposes a PMU that is
>>>>>> compatible with all pCPUs but that does not support programmable
>>>>>> event counters which may have different feature sets on different PMUs.
>>>>>>
>>>>>> This allows Windows guests to run reliably on heterogeneous systems
>>>>>> without crashing, even without vCPU pinning, and enables VMMs to
>>>>>> schedule vCPUs across all available pCPUs, making full use of the host
>>>>>> hardware.
>>>>>>
>>>>>> Much like KVM_ARM_VCPU_PMU_V3_IRQ and other read-write attributes, this
>>>>>> attribute provides a getter that facilitates kernel and userspace
>>>>>> debugging/testing.
>>>>>
>>>>> OK, so that's the sales pitch. But how is it implemented? I would like
>>>>> to be able to read a high-level description of the implementation
>>>>> trade-offs.
>>>>
>>>> Implementation-wise it is very trivial. Essentially the following
>>>> addition in kvm_arm_pmu_v3_get_attr() is the entire implementation:
>>>> + case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
>>>> + if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
>>>> &vcpu->kvm->arch.flags))
>>>> + return 0;
>>>>
>>>> Both its functionality and code complexity is trivial. So we can argue that:
>>>> - the functionality is too trivial to be useful or
>>>> - the interface/implementation complexity is so trivial that it does not
>>>> incur maintenance burden
>>>>
>>>> In this case the selftest uses the getter so I was more inclined to
>>>> have it, but adding one just for the selftest sounds too ad-hoc, so
>>>> here I looked into other attributes to ensure that it was not
>>>> introducing inconsistency with existing interfaces.
>>>>
>>>> As the result, I found there are other read-write attributes; in fact
>>>> there are more read-write attributes than write-only ones.
>>>
>>> You're completely missing the point. I'm referring to the whole of the
>>> commit message, which is more of a marketing slide than a technical
>>> description.
>>
>> In terms of implementation, the obvious tradeoff is that it adds more
>> code to implement the feature. One thing to note is that
>> kvm_vcpu_load_pmu() is added and is called each time a vCPU migrates
>> across pCPUs. The heavy part, making the KVM_REQ_RELOAD_PMU request,
>> only happens when the feature is enabled.
>
> Well, that's what I want to see. The repeated blurb about Windows
> being broken is cover letter material, but not fir for a commit
> message.
I understand. I'll leave the Windows issue the cover letter and write
focused patch messages after splitting patches.
>
> [...]
>
>>>>> "for the first time" gives the impression that it will work if you try
>>>>> again. I'd rather we say that "This feature is incompatible with the
>>>>> existence of a PMU event filter".
>>>>
>>>> The following sequence will work:
>>>> 1. Set KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
>>>> 2. Set KVM_ARM_VCPU_PMU_V3_FILTER
>>>> 3. Set KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
>>>>
>>>> This is to make the behavior conistent with KVM_ARM_VCPU_PMU_V3_SET_PMU.
>>>
>>> I don't think this is correct. Filtering is completely at odds with
>>> this patch, and I don't want to have to reason about the combination.
>>
>> kvm_arm_pmu_v3_set_pmu() has the following condition:
>>
>> if (kvm_vm_has_ran_once(kvm) ||
>> (kvm->arch.pmu_filter && kvm->arch.arm_pmu != arm_pmu)) {
>> ret = -EBUSY;
>> break;
>> }
>>
>> kvm_arm_pmu_v3_set_pmu_fixed_counters_only() has the corresponding
>> condition for consistency:
>>
>> if (kvm_vm_has_ran_once(kvm) ||
>> (kvm->arch.pmu_filter &&
>> !test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
>> &kvm->arch.flags)))
>> return -EBUSY;
>>
>> We can of course kill the PMU event filter for
>> FIXED_COUNTERS_ONLY. The filter is effectively no-op with
>> FIXED_COUNTERS_ONLY and I don't think that consistency matters much.
>
> We shouldn't allow weird combinations in the UAPI. Since it makes no
> sense to have both fixed-function *and* filters, we should make them
> mutually exclusive.
Then I think it makes sense to make KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS
mutually exclusive for consistency, but I found something weird with it.
The documentation says it already "mandates that a PMU has explicitly
been selected via KVM_ARM_VCPU_PMU_V3_SET_PMU", but apparently that's
not properly implemented.
kvm_arm_pmu_v3_set_nr_counters() has the following check:
if (!kvm->arch.arm_pmu)
return -EINVAL;
I suspect it is intended to check if a PMU has explicitly been selected,
but this check is effectively no-op because kvm_arm_set_default_pmu()
has already set kvm->arch.arm_pmu before reaching there.
Furthermore, tools/testing/selftests/kvm/arm64/vpmu_counter_access.c
seems to expect KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS to work without
selecting a PMU via KVM_ARM_VCPU_PMU_V3_SET_PMU.
How should we deal with the discrepancy between the documentation and
the implementation/selftest?
>
> [...]
>
>>>> I expect migration will be handled with the conventional register
>>>> getters and setters, but please share if you have a concern.
>>>
>>> At the very least I want to see some documentation explaining that.
>>
>> What kind of documentation do you expect?
>
> A description of what counters are exposed by this feature, and what
> architectural features they are dependent on.
I'll update the attribute documentation accordingly.
>
>> If we change kvm_vcpu_load_pmu() to avoid for_each_set_bit(), there
>> would be a good chance to forget updating it when mechanically
>> updating existing for_each_set_bit() instances, so it is a candidate
>> for documentation. But I don't have a good idea where to place it
>> either.
>
> The moment we introduce FEAT_PMUv3_ICNTR, the whole PMUv3 emulation
> will catch fire anyway, as we already limit ourselves to 32 bits for
> reset and nesting switch.
>
> So this will be a major redesign anyway. If you are really worried,
> leave a comment in there.
I'll leave one line comment in the implementation corresponding to the
attribute documentation.
Regards,
Akihiko Odaki
More information about the linux-arm-kernel
mailing list