[PATCH v2] KVM: arm64: Allow to limit number of PMU counters
Robin Murphy
robin.murphy at arm.com
Thu Sep 10 11:52:46 EDT 2020
On 2020-09-10 11:18, Alexander Graf wrote:
>
>
> On 10.09.20 12:06, Marc Zyngier wrote:
>>
>> On 2020-09-08 21:57, Alexander Graf wrote:
>>> We currently pass through the number of PMU counters that we have
>>> available
>>> in hardware to guests. So if my host supports 10 concurrently active
>>> PMU
>>> counters, my guest will be able to spawn 10 counters as well.
>>>
>>> This is undesireable if we also want to use the PMU on the host for
>>> monitoring. In that case, we want to split the PMU between guest and
>>> host.
>>>
>>> To help that case, let's add a PMU attr that allows us to limit the
>>> number
>>> of PMU counters that we expose. With this patch in place, user space
>>> can
>>> keep some counters free for host use.
>>>
>>> Signed-off-by: Alexander Graf <graf at amazon.com>
>>>
>>> ---
>>>
>>> Because this patch touches the same code paths as the vPMU filtering
>>> one
>>> and the vPMU filtering generalized a few conditions in the attr path,
>>> I've based it on top. Please let me know if you want it independent
>>> instead.
>>>
>>> v1 -> v2:
>>>
>>> - Add documentation
>>> - Add read support
>>> ---
>>> Documentation/virt/kvm/devices/vcpu.rst | 25 +++++++++++++++++++++++++
>>> arch/arm64/include/uapi/asm/kvm.h | 7 ++++---
>>> arch/arm64/kvm/pmu-emul.c | 32
>>> ++++++++++++++++++++++++++++++++
>>> arch/arm64/kvm/sys_regs.c | 5 +++++
>>> include/kvm/arm_pmu.h | 1 +
>>> 5 files changed, 67 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/Documentation/virt/kvm/devices/vcpu.rst
>>> b/Documentation/virt/kvm/devices/vcpu.rst
>>> index 203b91e93151..1a1c8d8c8b1d 100644
>>> --- a/Documentation/virt/kvm/devices/vcpu.rst
>>> +++ b/Documentation/virt/kvm/devices/vcpu.rst
>>> @@ -102,6 +102,31 @@ isn't strictly speaking an event. Filtering the
>>> cycle counter is possible
>>> using event 0x11 (CPU_CYCLES).
>>>
>>>
>>> +1.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_NUM_EVENTS
>>> +---------------------------------------------
>>> +
>>> +:Parameters: in kvm_device_attr.addr the address for the limit of
>>> concurrent
>>> + events is a pointer to an int
>>> +
>>> +:Returns:
>>> +
>>> + ======= ======================================================
>>> + -ENODEV: PMUv3 not supported
>>> + -EBUSY: PMUv3 already initialized
>>> + -EINVAL: Too large number of events
>>> + ======= ======================================================
>>> +
>>> +Reconfigure the limit of concurrent PMU events that the guest can
>>> monitor.
>>> +This number is directly exposed as part of the PMCR_EL0 register.
>>> +
>>> +On vcpu creation, this attribute is set to the hardware limit of the
>>> current
>>> +platform. If you need to determine the hardware limit, you can read
>>> this
>>> +attribute before setting it.
>>> +
>>> +Restrictions: The default value for this property is the number of
>>> hardware
>>> +supported events. Only values that are smaller than the hardware limit
>>> can
>>> +be set.
>>> +
>>> 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
>>> =================================
>>>
>>> diff --git a/arch/arm64/include/uapi/asm/kvm.h
>>> b/arch/arm64/include/uapi/asm/kvm.h
>>> index 7b1511d6ce44..db025c0b5a40 100644
>>> --- a/arch/arm64/include/uapi/asm/kvm.h
>>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>>> @@ -342,9 +342,10 @@ struct kvm_vcpu_events {
>>>
>>> /* Device Control API on vcpu fd */
>>> #define KVM_ARM_VCPU_PMU_V3_CTRL 0
>>> -#define KVM_ARM_VCPU_PMU_V3_IRQ 0
>>> -#define KVM_ARM_VCPU_PMU_V3_INIT 1
>>> -#define KVM_ARM_VCPU_PMU_V3_FILTER 2
>>> +#define KVM_ARM_VCPU_PMU_V3_IRQ 0
>>> +#define KVM_ARM_VCPU_PMU_V3_INIT 1
>>> +#define KVM_ARM_VCPU_PMU_V3_FILTER 2
>>> +#define KVM_ARM_VCPU_PMU_V3_NUM_EVENTS 3
>>> #define KVM_ARM_VCPU_TIMER_CTRL 1
>>> #define KVM_ARM_VCPU_TIMER_IRQ_VTIMER 0
>>> #define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
>>> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
>>> index 0458860bade2..c7915b95fec0 100644
>>> --- a/arch/arm64/kvm/pmu-emul.c
>>> +++ b/arch/arm64/kvm/pmu-emul.c
>>> @@ -253,6 +253,8 @@ void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu)
>>>
>>> for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++)
>>> pmu->pmc[i].idx = i;
>>> +
>>> + pmu->num_events = perf_num_counters() - 1;
>>> }
>>>
>>> /**
>>> @@ -978,6 +980,25 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu
>>> *vcpu, struct kvm_device_attr *attr)
>>>
>>> return 0;
>>> }
>>> + case KVM_ARM_VCPU_PMU_V3_NUM_EVENTS: {
>>> + u64 mask = ARMV8_PMU_PMCR_N_MASK <<
>>> ARMV8_PMU_PMCR_N_SHIFT;
>>> + int __user *uaddr = (int __user *)(long)attr->addr;
>>> + u32 num_events;
>>> +
>>> + if (get_user(num_events, uaddr))
>>> + return -EFAULT;
>>> +
>>> + if (num_events >= perf_num_counters())
>>> + return -EINVAL;
>>> +
>>> + vcpu->arch.pmu.num_events = num_events;
>>> +
>>> + num_events <<= ARMV8_PMU_PMCR_N_SHIFT;
>>> + __vcpu_sys_reg(vcpu, SYS_PMCR_EL0) &= ~mask;
>>> + __vcpu_sys_reg(vcpu, SYS_PMCR_EL0) |= num_events;
>>> +
>>> + return 0;
>>> + }
>>> case KVM_ARM_VCPU_PMU_V3_INIT:
>>> return kvm_arm_pmu_v3_init(vcpu);
>>> }
>>> @@ -1004,6 +1025,16 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu
>>> *vcpu, struct kvm_device_attr *attr)
>>> irq = vcpu->arch.pmu.irq_num;
>>> return put_user(irq, uaddr);
>>> }
>>> + case KVM_ARM_VCPU_PMU_V3_NUM_EVENTS: {
>>> + int __user *uaddr = (int __user *)(long)attr->addr;
>>> + u32 num_events;
>>> +
>>> + if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
>>> + return -ENODEV;
>>> +
>>> + num_events = vcpu->arch.pmu.num_events;
>>> + return put_user(num_events, uaddr);
>>> + }
>>> }
>>>
>>> return -ENXIO;
>>> @@ -1015,6 +1046,7 @@ int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu
>>> *vcpu, struct kvm_device_attr *attr)
>>> case KVM_ARM_VCPU_PMU_V3_IRQ:
>>> case KVM_ARM_VCPU_PMU_V3_INIT:
>>> case KVM_ARM_VCPU_PMU_V3_FILTER:
>>> + case KVM_ARM_VCPU_PMU_V3_NUM_EVENTS:
>>> if (kvm_arm_support_pmu_v3() &&
>>> test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
>>> return 0;
>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>> index 20ab2a7d37ca..d51e39600bbd 100644
>>> --- a/arch/arm64/kvm/sys_regs.c
>>> +++ b/arch/arm64/kvm/sys_regs.c
>>> @@ -672,6 +672,11 @@ static void reset_pmcr(struct kvm_vcpu *vcpu,
>>> const struct sys_reg_desc *r)
>>> | (ARMV8_PMU_PMCR_MASK & 0xdecafbad)) &
>>> (~ARMV8_PMU_PMCR_E);
>>> if (!system_supports_32bit_el0())
>>> val |= ARMV8_PMU_PMCR_LC;
>>> +
>>> + /* Override number of event selectors */
>>> + val &= ~(ARMV8_PMU_PMCR_N_MASK << ARMV8_PMU_PMCR_N_SHIFT);
>>> + val |= (u32)vcpu->arch.pmu.num_events << ARMV8_PMU_PMCR_N_SHIFT;
>>> +
>>> __vcpu_sys_reg(vcpu, r->reg) = val;
>>> }
>>>
>>> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
>>> index 98cbfe885a53..ea3fc96a37d9 100644
>>> --- a/include/kvm/arm_pmu.h
>>> +++ b/include/kvm/arm_pmu.h
>>> @@ -27,6 +27,7 @@ struct kvm_pmu {
>>> bool ready;
>>> bool created;
>>> bool irq_level;
>>> + u8 num_events;
>>> };
>>>
>>> #define kvm_arm_pmu_v3_ready(v) ((v)->arch.pmu.ready)
>>
>> I see several problems with this approach:
>>
>> - userspace doesn't really have a good way to retrieve the number of
>> counters.
> It does with v2, because it can then just read the register ;). I agree
> that it's clunky though.
>
>>
>> - Limiting the number of counters for the guest doesn't mean anything
>> when it comes to the actual use of the HW counters, given that we
>> don't allocate them ourselves (it's all perf doing the actual work).
>
> We do cap the number of actively requestable counters via perf by the
> PMCR.N limit. So in a way, it does mean something.
>
>> - If you want to "pin" counters for the host, why don't you just do
>> that before starting the guest?
>
> You can do that. Imagine I have 10 counters. I pin 4 of them to the
> host. I still tell my guest that it can use 6. That means perf will then
> time slice and juggle 10 guest event counters on those remaining 6
> hardware counters. That juggling heavily reduces accuracy.
>
>> I think you need to look at the bigger picture: how to limit the use
>> of physical counter usage for a given userspace task. This needs
>> to happen in perf itself, and not in KVM.
>
> That's definitely another way to look at it that I agree with.
>
> What we really want is to expose the number of counters the guest has
> available, not the number of counters hardware can support at maximum.
>
> So in theory it would be enough to ask perf how many counters it does
> have free for me to consume without overcommitting. But that would
> potentially change between multiple invocations of KVM and thus break
> things like live migration, no?
>
> Maybe what we really want is an interface to perf from user space to say
> "how many counters can you dedicate to me?" and "reserve them for me".
> Then user space could reserve them as dedicated counters and KVM would
> just need to either probe for the reservation or get told by user space
> what to expose via ONE_REG as Drew suggested. It'd be up to user space
> to ensure that the reservation matches the number of exposed counters then.
Note that if the aim is to avoid the guest seeing unexpectedly weird
behaviour, then it's not just the *number* of counters that matters, but
the underlying physical allocation too, thanks to the possibility of
chained events.
Robin.
More information about the linux-arm-kernel
mailing list