[PATCH v7 3/4] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY

Akihiko Odaki odaki at rsg.ci.i.u-tokyo.ac.jp
Wed Apr 22 00:02:05 PDT 2026


On 2026/04/20 22:53, Marc Zyngier wrote:
> On Mon, 20 Apr 2026 13:07:15 +0100,
> Akihiko Odaki <odaki at rsg.ci.i.u-tokyo.ac.jp> wrote:
>>
>> On 2026/04/20 18:51, Marc Zyngier wrote:
>>> On Mon, 20 Apr 2026 09:36:16 +0100,
>>> Akihiko Odaki <odaki at rsg.ci.i.u-tokyo.ac.jp> wrote:
>>>>
>>>> On 2026/04/20 2:19, Marc Zyngier wrote:
>>>>> On Sat, 18 Apr 2026 09:14:25 +0100,
>>>>> Akihiko Odaki <odaki at rsg.ci.i.u-tokyo.ac.jp> wrote:
>>>>>>
>>>>>> On a heterogeneous arm64 system, KVM's PMU emulation is based on the
>>>>>> features of a single host PMU instance. When a vCPU is migrated to a
>>>>>> pCPU with an incompatible PMU, counters such as PMCCNTR_EL0 stop
>>>>>> incrementing.
>>>>>>
>>>>>> Although this behavior is permitted by the architecture, Windows does
>>>>>> not handle it gracefully and may crash with a division-by-zero error.
>>>>>>
>>>>>> The current workaround requires VMMs to pin vCPUs to a set of pCPUs
>>>>>> that share a compatible PMU. This is difficult to implement correctly in
>>>>>> QEMU/libvirt, where pinning occurs after vCPU initialization, and it
>>>>>> also restricts the guest to a subset of available pCPUs.
>>>>>>
>>>>>> Introduce the KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY attribute to
>>>>>> create a "fixed-counters-only" PMU. When set, KVM exposes a PMU that is
>>>>>> compatible with all pCPUs but that does not support programmable
>>>>>> event counters which may have different feature sets on different PMUs.
>>>>>>
>>>>>> This allows Windows guests to run reliably on heterogeneous systems
>>>>>> without crashing, even without vCPU pinning, and enables VMMs to
>>>>>> schedule vCPUs across all available pCPUs, making full use of the host
>>>>>> hardware.
>>>>>>
>>>>>> Much like KVM_ARM_VCPU_PMU_V3_IRQ and other read-write attributes, this
>>>>>> attribute provides a getter that facilitates kernel and userspace
>>>>>> debugging/testing.
>>>>>
>>>>> OK, so that's the sales pitch. But how is it implemented? I would like
>>>>> to be able to read a high-level description of the implementation
>>>>> trade-offs.
>>>>
>>>> Implementation-wise it is very trivial. Essentially the following
>>>> addition in kvm_arm_pmu_v3_get_attr() is the entire implementation:
>>>> +	case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
>>>> +		if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
>>>> &vcpu->kvm->arch.flags))
>>>> +			return 0;
>>>>
>>>> Both its functionality and code complexity is trivial. So we can argue that:
>>>> - the functionality is too trivial to be useful or
>>>> - the interface/implementation complexity is so trivial that it does not
>>>>     incur maintenance burden
>>>>
>>>> In this case the selftest uses the getter so I was more inclined to
>>>> have it, but adding one just for the selftest sounds too ad-hoc, so
>>>> here I looked into other attributes to ensure that it was not
>>>> introducing inconsistency with existing interfaces.
>>>>
>>>> As the result, I found there are other read-write attributes; in fact
>>>> there are more read-write attributes than write-only ones.
>>>
>>> You're completely missing the point. I'm referring to the whole of the
>>> commit message, which is more of a marketing slide than a technical
>>> description.
>>
>> In terms of implementation, the obvious tradeoff is that it adds more
>> code to implement the feature. One thing to note is that
>> kvm_vcpu_load_pmu() is added and is called each time a vCPU migrates
>> across pCPUs. The heavy part, making the KVM_REQ_RELOAD_PMU request,
>> only happens when the feature is enabled.
> 
> Well, that's what I want to see. The repeated blurb about Windows
> being broken is cover letter material, but not fir for a commit
> message.

I understand. I'll leave the Windows issue the cover letter and write 
focused patch messages after splitting patches.

> 
> [...]
> 
>>>>> "for the first time" gives the impression that it will work if you try
>>>>> again. I'd rather we say that "This feature is incompatible with the
>>>>> existence of a PMU event filter".
>>>>
>>>> The following sequence will work:
>>>> 1. Set KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
>>>> 2. Set KVM_ARM_VCPU_PMU_V3_FILTER
>>>> 3. Set KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
>>>>
>>>> This is to make the behavior conistent with KVM_ARM_VCPU_PMU_V3_SET_PMU.
>>>
>>> I don't think this is correct. Filtering is completely at odds with
>>> this patch, and I don't want to have to reason about the combination.
>>
>> kvm_arm_pmu_v3_set_pmu() has the following condition:
>>
>> if (kvm_vm_has_ran_once(kvm) ||
>>      (kvm->arch.pmu_filter && kvm->arch.arm_pmu != arm_pmu)) {
>> 	ret = -EBUSY;
>> 	break;
>> }
>>
>> kvm_arm_pmu_v3_set_pmu_fixed_counters_only() has the corresponding
>> condition for consistency:
>>
>> if (kvm_vm_has_ran_once(kvm) ||
>>      (kvm->arch.pmu_filter &&
>>       !test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
>> 	       &kvm->arch.flags)))
>> 	return -EBUSY;
>>
>> We can of course kill the PMU event filter for
>> FIXED_COUNTERS_ONLY. The filter is effectively no-op with
>> FIXED_COUNTERS_ONLY and I don't think that consistency matters much.
> 
> We shouldn't allow weird combinations in the UAPI. Since it makes no
> sense to have both fixed-function *and* filters, we should make them
> mutually exclusive.

Then I think it makes sense to make KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS 
mutually exclusive for consistency, but I found something weird with it.

The documentation says it already "mandates that a PMU has explicitly 
been selected via KVM_ARM_VCPU_PMU_V3_SET_PMU", but apparently that's 
not properly implemented.

kvm_arm_pmu_v3_set_nr_counters() has the following check:
	if (!kvm->arch.arm_pmu)
		return -EINVAL;

I suspect it is intended to check if a PMU has explicitly been selected, 
but this check is effectively no-op because kvm_arm_set_default_pmu() 
has already set kvm->arch.arm_pmu before reaching there.

Furthermore, tools/testing/selftests/kvm/arm64/vpmu_counter_access.c 
seems to expect KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS to work without 
selecting a PMU via KVM_ARM_VCPU_PMU_V3_SET_PMU.

How should we deal with the discrepancy between the documentation and 
the implementation/selftest?

> 
> [...]
> 
>>>> I expect migration will be handled with the conventional register
>>>> getters and setters, but please share if you have a concern.
>>>
>>> At the very least I want to see some documentation explaining that.
>>
>> What kind of documentation do you expect?
> 
> A description of what counters are exposed by this feature, and what
> architectural features they are dependent on.

I'll update the attribute documentation accordingly.

> 
>> If we change kvm_vcpu_load_pmu() to avoid for_each_set_bit(), there
>> would be a good chance to forget updating it when mechanically
>> updating existing for_each_set_bit() instances, so it is a candidate
>> for documentation. But I don't have a good idea where to place it
>> either.
> 
> The moment we introduce FEAT_PMUv3_ICNTR, the whole PMUv3 emulation
> will catch fire anyway, as we already limit ourselves to 32 bits for
> reset and nesting switch.
> 
> So this will be a major redesign anyway. If you are really worried,
> leave a comment in there.

I'll leave one line comment in the implementation corresponding to the 
attribute documentation.

Regards,
Akihiko Odaki



More information about the linux-arm-kernel mailing list