[PATCH v3 1/2] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY

Thu Feb 26 06:47:54 PST 2026

On 2026/02/26 23:43, Akihiko Odaki wrote:
> On 2026/02/26 20:54, Oliver Upton wrote:
>> Hi Akihiko,
>>
>> On Wed, Feb 25, 2026 at 01:31:15PM +0900, Akihiko Odaki wrote:
>>> @@ -629,6 +629,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, 
>>> int cpu)
>>>           kvm_vcpu_load_vhe(vcpu);
>>>       kvm_arch_vcpu_load_fp(vcpu);
>>>       kvm_vcpu_pmu_restore_guest(vcpu);
>>> +    if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &vcpu- 
>>> >kvm->arch.flags))
>>> +        kvm_make_request(KVM_REQ_CREATE_PMU, vcpu);
>>
>> We only need to set the request if the vCPU has migrated to a different
>> PMU implementation, no?
> 
> Indeed. I was too lazy to implement such a check since it won't affect 
> performance unless the new feature is requested, but having one may be 
> still nice.
> 
>>
>>>       if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
>>>           kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
>>> @@ -1056,6 +1058,9 @@ static int check_vcpu_requests(struct kvm_vcpu 
>>> *vcpu)
>>>           if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
>>>               kvm_vcpu_reload_pmu(vcpu);
>>> +        if (kvm_check_request(KVM_REQ_CREATE_PMU, vcpu))
>>> +            kvm_vcpu_create_pmu(vcpu);
>>> +
>>
>> My strong preference would be to squash the migration handling into
>> kvm_vcpu_reload_pmu(). It is already reprogramming PMU events in
>> response to other things.
> 
> Can you share a reason for that?
> 
> In terms of complexity, I don't think it will help reducing complexity 
> since the only common things between kvm_vcpu_reload_pmu() and 
> kvm_vcpu_create_pmu() are the enumeration of enabled counters, which is 
> simple enough.
> 
> In terms of performance, I guess it is better to keep 
> kvm_vcpu_create_pmu() small since it is triggered for each migration.
> 
>>
>>>           if (kvm_check_request(KVM_REQ_RESYNC_PMU_EL0, vcpu))
>>>               kvm_vcpu_pmu_restore_guest(vcpu);
>>> @@ -1516,7 +1521,8 @@ static int kvm_setup_vcpu(struct kvm_vcpu *vcpu)
>>>        * When the vCPU has a PMU, but no PMU is set for the guest
>>>        * yet, set the default one.
>>>        */
>>> -    if (kvm_vcpu_has_pmu(vcpu) && !kvm->arch.arm_pmu)
>>> +    if (kvm_vcpu_has_pmu(vcpu) && !kvm->arch.arm_pmu &&
>>> +        !test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &kvm- 
>>> >arch.flags))
>>>           ret = kvm_arm_set_default_pmu(kvm);
>>
>> I'd rather just initialize it to a default than have to deal with the
>> field being sometimes null.
> 
> I agree. I'll change this with the next version.
> 
>>
>>> -static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
>>> +static u64 kvm_pmu_enabled_counter_mask(struct kvm_vcpu *vcpu)
>>>   {
>>> -    struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
>>> -    unsigned int mdcr = __vcpu_sys_reg(vcpu, MDCR_EL2);
>>> +    u64 mask = 0;
>>> -    if (!(__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & BIT(pmc->idx)))
>>> -        return false;
>>> +    if (__vcpu_sys_reg(vcpu, MDCR_EL2) & MDCR_EL2_HPME)
>>> +        mask |= kvm_pmu_hyp_counter_mask(vcpu);
>>> -    if (kvm_pmu_counter_is_hyp(vcpu, pmc->idx))
>>> -        return mdcr & MDCR_EL2_HPME;
>>> +    if (kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E)
>>> +        mask |= ~kvm_pmu_hyp_counter_mask(vcpu);
>>> -    return kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E;
>>> +    return __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
>>> +}
>>> +
>>> +static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
>>> +{
>>> +    struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
>>> +
>>> +    return kvm_pmu_enabled_counter_mask(vcpu) & BIT(pmc->idx);
>>>   }
>>
>> You're churning a good bit of code, this needs to happen in a separate
>> patch (if at all).
> 
> It makes sense. The next version will have a separate patch for this.
> 
>>
>>> @@ -689,6 +710,14 @@ static void kvm_pmu_create_perf_event(struct 
>>> kvm_pmc *pmc)
>>>       int eventsel;
>>>       u64 evtreg;
>>> +    if (!arm_pmu) {
>>> +        arm_pmu = kvm_pmu_probe_armpmu(vcpu->cpu);
>>
>> kvm_pmu_probe_armpmu() takes a global mutex, I'm not sure that's what we
>> want.
>>
>> What prevents us from opening a PERF_TYPE_RAW event and allowing perf to
>> work out the right PMU for this CPU?
> 
> Unfortunately perf does not seem to have a capability to switch to the 
> right PMU. tools/perf/Documentation/intel-hybrid.txt says the perf tool 
> creates events for each PMU in a hybird configuration, for example.

I think I misunderstood what you meant. Letting 
perf_event_create_kernel_counter() to figure out what a PMU to use may 
be a good idea. I'll give a try with the next version.

> 
>>
>>> +        if (!arm_pmu) {
>>> +            vcpu_set_on_unsupported_cpu(vcpu);
>>
>> At this point it seems pretty late to flag the CPU as unsupported. Maybe
>> instead we can compute the union cpumask for all the PMU implemetations
>> the VM may schedule on.
> 
> This is just a safe guard and it is a responsibility of the userspace to 
> schedule the VCPU properly. It is conceptually same with what 
> kvm_arch_vcpu_load() does when migrating to an unsupported CPU.
> 
>>
>>> @@ -1249,6 +1299,10 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu 
>>> *vcpu, struct kvm_device_attr *attr)
>>>           irq = vcpu->arch.pmu.irq_num;
>>>           return put_user(irq, uaddr);
>>>       }
>>> +    case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
>>> +        lockdep_assert_held(&vcpu->kvm->arch.config_lock);
>>> +        if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, 
>>> &vcpu->kvm->arch.flags))
>>> +            return 0;
>>
>> We don't need a getter for this, userspace should remember how it
>> provisioned the VM.
> 
> The getter is useful for debugging and testing. The selftest will use it 
> to query the current state.
> 
> Regards,
> Akihiko Odaki