[PATCH v2] KVM: arm64: Allow to limit number of PMU counters
Alexander Graf
graf at amazon.com
Thu Sep 10 12:46:32 EDT 2020
On 10.09.20 17:52, Robin Murphy wrote:
>
> On 2020-09-10 11:18, Alexander Graf wrote:
>>
>>
>> On 10.09.20 12:06, Marc Zyngier wrote:
>>>
>>> On 2020-09-08 21:57, Alexander Graf wrote:
>>>> We currently pass through the number of PMU counters that we have
>>>> available
>>>> in hardware to guests. So if my host supports 10 concurrently active
>>>> PMU
>>>> counters, my guest will be able to spawn 10 counters as well.
>>>>
>>>> This is undesireable if we also want to use the PMU on the host for
>>>> monitoring. In that case, we want to split the PMU between guest and
>>>> host.
>>>>
>>>> To help that case, let's add a PMU attr that allows us to limit the
>>>> number
>>>> of PMU counters that we expose. With this patch in place, user space
>>>> can
>>>> keep some counters free for host use.
>>>>
>>>> Signed-off-by: Alexander Graf <graf at amazon.com>
>>>>
>>>> ---
>>>>
>>>> Because this patch touches the same code paths as the vPMU filtering
>>>> one
>>>> and the vPMU filtering generalized a few conditions in the attr path,
>>>> I've based it on top. Please let me know if you want it independent
>>>> instead.
>>>>
>>>> v1 -> v2:
>>>>
>>>> - Add documentation
>>>> - Add read support
>>>> ---
>>>> Documentation/virt/kvm/devices/vcpu.rst | 25 +++++++++++++++++++++++++
>>>> arch/arm64/include/uapi/asm/kvm.h | 7 ++++---
>>>> arch/arm64/kvm/pmu-emul.c | 32
>>>> ++++++++++++++++++++++++++++++++
>>>> arch/arm64/kvm/sys_regs.c | 5 +++++
>>>> include/kvm/arm_pmu.h | 1 +
>>>> 5 files changed, 67 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/Documentation/virt/kvm/devices/vcpu.rst
>>>> b/Documentation/virt/kvm/devices/vcpu.rst
>>>> index 203b91e93151..1a1c8d8c8b1d 100644
>>>> --- a/Documentation/virt/kvm/devices/vcpu.rst
>>>> +++ b/Documentation/virt/kvm/devices/vcpu.rst
>>>> @@ -102,6 +102,31 @@ isn't strictly speaking an event. Filtering the
>>>> cycle counter is possible
>>>> using event 0x11 (CPU_CYCLES).
>>>>
>>>>
>>>> +1.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_NUM_EVENTS
>>>> +---------------------------------------------
>>>> +
>>>> +:Parameters: in kvm_device_attr.addr the address for the limit of
>>>> concurrent
>>>> + events is a pointer to an int
>>>> +
>>>> +:Returns:
>>>> +
>>>> + ======= ======================================================
>>>> + -ENODEV: PMUv3 not supported
>>>> + -EBUSY: PMUv3 already initialized
>>>> + -EINVAL: Too large number of events
>>>> + ======= ======================================================
>>>> +
>>>> +Reconfigure the limit of concurrent PMU events that the guest can
>>>> monitor.
>>>> +This number is directly exposed as part of the PMCR_EL0 register.
>>>> +
>>>> +On vcpu creation, this attribute is set to the hardware limit of the
>>>> current
>>>> +platform. If you need to determine the hardware limit, you can read
>>>> this
>>>> +attribute before setting it.
>>>> +
>>>> +Restrictions: The default value for this property is the number of
>>>> hardware
>>>> +supported events. Only values that are smaller than the hardware limit
>>>> can
>>>> +be set.
>>>> +
>>>> 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
>>>> =================================
>>>>
>>>> diff --git a/arch/arm64/include/uapi/asm/kvm.h
>>>> b/arch/arm64/include/uapi/asm/kvm.h
>>>> index 7b1511d6ce44..db025c0b5a40 100644
>>>> --- a/arch/arm64/include/uapi/asm/kvm.h
>>>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>>>> @@ -342,9 +342,10 @@ struct kvm_vcpu_events {
>>>>
>>>> /* Device Control API on vcpu fd */
>>>> #define KVM_ARM_VCPU_PMU_V3_CTRL 0
>>>> -#define KVM_ARM_VCPU_PMU_V3_IRQ 0
>>>> -#define KVM_ARM_VCPU_PMU_V3_INIT 1
>>>> -#define KVM_ARM_VCPU_PMU_V3_FILTER 2
>>>> +#define KVM_ARM_VCPU_PMU_V3_IRQ 0
>>>> +#define KVM_ARM_VCPU_PMU_V3_INIT 1
>>>> +#define KVM_ARM_VCPU_PMU_V3_FILTER 2
>>>> +#define KVM_ARM_VCPU_PMU_V3_NUM_EVENTS 3
>>>> #define KVM_ARM_VCPU_TIMER_CTRL 1
>>>> #define KVM_ARM_VCPU_TIMER_IRQ_VTIMER 0
>>>> #define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
>>>> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
>>>> index 0458860bade2..c7915b95fec0 100644
>>>> --- a/arch/arm64/kvm/pmu-emul.c
>>>> +++ b/arch/arm64/kvm/pmu-emul.c
>>>> @@ -253,6 +253,8 @@ void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu)
>>>>
>>>> for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++)
>>>> pmu->pmc[i].idx = i;
>>>> +
>>>> + pmu->num_events = perf_num_counters() - 1;
>>>> }
>>>>
>>>> /**
>>>> @@ -978,6 +980,25 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu
>>>> *vcpu, struct kvm_device_attr *attr)
>>>>
>>>> return 0;
>>>> }
>>>> + case KVM_ARM_VCPU_PMU_V3_NUM_EVENTS: {
>>>> + u64 mask = ARMV8_PMU_PMCR_N_MASK <<
>>>> ARMV8_PMU_PMCR_N_SHIFT;
>>>> + int __user *uaddr = (int __user *)(long)attr->addr;
>>>> + u32 num_events;
>>>> +
>>>> + if (get_user(num_events, uaddr))
>>>> + return -EFAULT;
>>>> +
>>>> + if (num_events >= perf_num_counters())
>>>> + return -EINVAL;
>>>> +
>>>> + vcpu->arch.pmu.num_events = num_events;
>>>> +
>>>> + num_events <<= ARMV8_PMU_PMCR_N_SHIFT;
>>>> + __vcpu_sys_reg(vcpu, SYS_PMCR_EL0) &= ~mask;
>>>> + __vcpu_sys_reg(vcpu, SYS_PMCR_EL0) |= num_events;
>>>> +
>>>> + return 0;
>>>> + }
>>>> case KVM_ARM_VCPU_PMU_V3_INIT:
>>>> return kvm_arm_pmu_v3_init(vcpu);
>>>> }
>>>> @@ -1004,6 +1025,16 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu
>>>> *vcpu, struct kvm_device_attr *attr)
>>>> irq = vcpu->arch.pmu.irq_num;
>>>> return put_user(irq, uaddr);
>>>> }
>>>> + case KVM_ARM_VCPU_PMU_V3_NUM_EVENTS: {
>>>> + int __user *uaddr = (int __user *)(long)attr->addr;
>>>> + u32 num_events;
>>>> +
>>>> + if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
>>>> + return -ENODEV;
>>>> +
>>>> + num_events = vcpu->arch.pmu.num_events;
>>>> + return put_user(num_events, uaddr);
>>>> + }
>>>> }
>>>>
>>>> return -ENXIO;
>>>> @@ -1015,6 +1046,7 @@ int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu
>>>> *vcpu, struct kvm_device_attr *attr)
>>>> case KVM_ARM_VCPU_PMU_V3_IRQ:
>>>> case KVM_ARM_VCPU_PMU_V3_INIT:
>>>> case KVM_ARM_VCPU_PMU_V3_FILTER:
>>>> + case KVM_ARM_VCPU_PMU_V3_NUM_EVENTS:
>>>> if (kvm_arm_support_pmu_v3() &&
>>>> test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
>>>> return 0;
>>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>>> index 20ab2a7d37ca..d51e39600bbd 100644
>>>> --- a/arch/arm64/kvm/sys_regs.c
>>>> +++ b/arch/arm64/kvm/sys_regs.c
>>>> @@ -672,6 +672,11 @@ static void reset_pmcr(struct kvm_vcpu *vcpu,
>>>> const struct sys_reg_desc *r)
>>>> | (ARMV8_PMU_PMCR_MASK & 0xdecafbad)) &
>>>> (~ARMV8_PMU_PMCR_E);
>>>> if (!system_supports_32bit_el0())
>>>> val |= ARMV8_PMU_PMCR_LC;
>>>> +
>>>> + /* Override number of event selectors */
>>>> + val &= ~(ARMV8_PMU_PMCR_N_MASK << ARMV8_PMU_PMCR_N_SHIFT);
>>>> + val |= (u32)vcpu->arch.pmu.num_events << ARMV8_PMU_PMCR_N_SHIFT;
>>>> +
>>>> __vcpu_sys_reg(vcpu, r->reg) = val;
>>>> }
>>>>
>>>> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
>>>> index 98cbfe885a53..ea3fc96a37d9 100644
>>>> --- a/include/kvm/arm_pmu.h
>>>> +++ b/include/kvm/arm_pmu.h
>>>> @@ -27,6 +27,7 @@ struct kvm_pmu {
>>>> bool ready;
>>>> bool created;
>>>> bool irq_level;
>>>> + u8 num_events;
>>>> };
>>>>
>>>> #define kvm_arm_pmu_v3_ready(v) ((v)->arch.pmu.ready)
>>>
>>> I see several problems with this approach:
>>>
>>> - userspace doesn't really have a good way to retrieve the number of
>>> counters.
>> It does with v2, because it can then just read the register ;). I agree
>> that it's clunky though.
>>
>>>
>>> - Limiting the number of counters for the guest doesn't mean anything
>>> when it comes to the actual use of the HW counters, given that we
>>> don't allocate them ourselves (it's all perf doing the actual work).
>>
>> We do cap the number of actively requestable counters via perf by the
>> PMCR.N limit. So in a way, it does mean something.
>>
>>> - If you want to "pin" counters for the host, why don't you just do
>>> that before starting the guest?
>>
>> You can do that. Imagine I have 10 counters. I pin 4 of them to the
>> host. I still tell my guest that it can use 6. That means perf will then
>> time slice and juggle 10 guest event counters on those remaining 6
>> hardware counters. That juggling heavily reduces accuracy.
>>
>>> I think you need to look at the bigger picture: how to limit the use
>>> of physical counter usage for a given userspace task. This needs
>>> to happen in perf itself, and not in KVM.
>>
>> That's definitely another way to look at it that I agree with.
>>
>> What we really want is to expose the number of counters the guest has
>> available, not the number of counters hardware can support at maximum.
>>
>> So in theory it would be enough to ask perf how many counters it does
>> have free for me to consume without overcommitting. But that would
>> potentially change between multiple invocations of KVM and thus break
>> things like live migration, no?
>>
>> Maybe what we really want is an interface to perf from user space to say
>> "how many counters can you dedicate to me?" and "reserve them for me".
>> Then user space could reserve them as dedicated counters and KVM would
>> just need to either probe for the reservation or get told by user space
>> what to expose via ONE_REG as Drew suggested. It'd be up to user space
>> to ensure that the reservation matches the number of exposed counters
>> then.
>
> Note that if the aim is to avoid the guest seeing unexpectedly weird
> behaviour, then it's not just the *number* of counters that matters, but
> the underlying physical allocation too, thanks to the possibility of
> chained events.
Wouldn't ideally guest chaining propagate into host chaining as well?
I'd have to double check if it does, but in my naive thinking if I
reserve 4 hardware counters for the guest and the guest ends up using 4
hardware counters regardless of their chaining attributes, I'd still be
able to fit them all?
Alex
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
More information about the linux-arm-kernel
mailing list