[PATCH v3 1/2] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY

Thu Feb 26 06:43:21 PST 2026

On 2026/02/26 20:54, Oliver Upton wrote:
> Hi Akihiko,
> 
> On Wed, Feb 25, 2026 at 01:31:15PM +0900, Akihiko Odaki wrote:
>> @@ -629,6 +629,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>   		kvm_vcpu_load_vhe(vcpu);
>>   	kvm_arch_vcpu_load_fp(vcpu);
>>   	kvm_vcpu_pmu_restore_guest(vcpu);
>> +	if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &vcpu->kvm->arch.flags))
>> +		kvm_make_request(KVM_REQ_CREATE_PMU, vcpu);
> 
> We only need to set the request if the vCPU has migrated to a different
> PMU implementation, no?

Indeed. I was too lazy to implement such a check since it won't affect 
performance unless the new feature is requested, but having one may be 
still nice.

> 
>>   	if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
>>   		kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
>>   
>> @@ -1056,6 +1058,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
>>   		if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
>>   			kvm_vcpu_reload_pmu(vcpu);
>>   
>> +		if (kvm_check_request(KVM_REQ_CREATE_PMU, vcpu))
>> +			kvm_vcpu_create_pmu(vcpu);
>> +
> 
> My strong preference would be to squash the migration handling into
> kvm_vcpu_reload_pmu(). It is already reprogramming PMU events in
> response to other things.

Can you share a reason for that?

In terms of complexity, I don't think it will help reducing complexity 
since the only common things between kvm_vcpu_reload_pmu() and 
kvm_vcpu_create_pmu() are the enumeration of enabled counters, which is 
simple enough.

In terms of performance, I guess it is better to keep 
kvm_vcpu_create_pmu() small since it is triggered for each migration.

> 
>>   		if (kvm_check_request(KVM_REQ_RESYNC_PMU_EL0, vcpu))
>>   			kvm_vcpu_pmu_restore_guest(vcpu);
>>   
>> @@ -1516,7 +1521,8 @@ static int kvm_setup_vcpu(struct kvm_vcpu *vcpu)
>>   	 * When the vCPU has a PMU, but no PMU is set for the guest
>>   	 * yet, set the default one.
>>   	 */
>> -	if (kvm_vcpu_has_pmu(vcpu) && !kvm->arch.arm_pmu)
>> +	if (kvm_vcpu_has_pmu(vcpu) && !kvm->arch.arm_pmu &&
>> +	    !test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &kvm->arch.flags))
>>   		ret = kvm_arm_set_default_pmu(kvm);
> 
> I'd rather just initialize it to a default than have to deal with the
> field being sometimes null.

I agree. I'll change this with the next version.

> 
>> -static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
>> +static u64 kvm_pmu_enabled_counter_mask(struct kvm_vcpu *vcpu)
>>   {
>> -	struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
>> -	unsigned int mdcr = __vcpu_sys_reg(vcpu, MDCR_EL2);
>> +	u64 mask = 0;
>>   
>> -	if (!(__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & BIT(pmc->idx)))
>> -		return false;
>> +	if (__vcpu_sys_reg(vcpu, MDCR_EL2) & MDCR_EL2_HPME)
>> +		mask |= kvm_pmu_hyp_counter_mask(vcpu);
>>   
>> -	if (kvm_pmu_counter_is_hyp(vcpu, pmc->idx))
>> -		return mdcr & MDCR_EL2_HPME;
>> +	if (kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E)
>> +		mask |= ~kvm_pmu_hyp_counter_mask(vcpu);
>>   
>> -	return kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E;
>> +	return __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
>> +}
>> +
>> +static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
>> +{
>> +	struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
>> +
>> +	return kvm_pmu_enabled_counter_mask(vcpu) & BIT(pmc->idx);
>>   }
> 
> You're churning a good bit of code, this needs to happen in a separate
> patch (if at all).

It makes sense. The next version will have a separate patch for this.

> 
>> @@ -689,6 +710,14 @@ static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc)
>>   	int eventsel;
>>   	u64 evtreg;
>>   
>> +	if (!arm_pmu) {
>> +		arm_pmu = kvm_pmu_probe_armpmu(vcpu->cpu);
> 
> kvm_pmu_probe_armpmu() takes a global mutex, I'm not sure that's what we
> want.
> 
> What prevents us from opening a PERF_TYPE_RAW event and allowing perf to
> work out the right PMU for this CPU?

Unfortunately perf does not seem to have a capability to switch to the 
right PMU. tools/perf/Documentation/intel-hybrid.txt says the perf tool 
creates events for each PMU in a hybird configuration, for example.

> 
>> +		if (!arm_pmu) {
>> +			vcpu_set_on_unsupported_cpu(vcpu);
> 
> At this point it seems pretty late to flag the CPU as unsupported. Maybe
> instead we can compute the union cpumask for all the PMU implemetations
> the VM may schedule on.

This is just a safe guard and it is a responsibility of the userspace to 
schedule the VCPU properly. It is conceptually same with what 
kvm_arch_vcpu_load() does when migrating to an unsupported CPU.

> 
>> @@ -1249,6 +1299,10 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>>   		irq = vcpu->arch.pmu.irq_num;
>>   		return put_user(irq, uaddr);
>>   	}
>> +	case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
>> +		lockdep_assert_held(&vcpu->kvm->arch.config_lock);
>> +		if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &vcpu->kvm->arch.flags))
>> +			return 0;
> 
> We don't need a getter for this, userspace should remember how it
> provisioned the VM.

The getter is useful for debugging and testing. The selftest will use it 
to query the current state.

Regards,
Akihiko Odaki