[PATCH v8 20/20] KVM: ARM64: Add a new kvm ARM PMU device

Fri Jan 8 03:22:13 PST 2016

On Fri, Jan 08, 2016 at 10:53:03AM +0800, Shannon Zhao wrote:
> 
> 
> On 2016/1/8 4:18, Andrew Jones wrote:
> > On Tue, Dec 22, 2015 at 04:08:15PM +0800, Shannon Zhao wrote:
> >> From: Shannon Zhao <shannon.zhao at linaro.org>
> >>
> >> Add a new kvm device type KVM_DEV_TYPE_ARM_PMU_V3 for ARM PMU. Implement
> >> the kvm_device_ops for it.
> >>
> >> Signed-off-by: Shannon Zhao <shannon.zhao at linaro.org>
> >> ---
> >>  Documentation/virtual/kvm/devices/arm-pmu.txt |  24 +++++
> >>  arch/arm64/include/uapi/asm/kvm.h             |   4 +
> >>  include/linux/kvm_host.h                      |   1 +
> >>  include/uapi/linux/kvm.h                      |   2 +
> >>  virt/kvm/arm/pmu.c                            | 128 ++++++++++++++++++++++++++
> >>  virt/kvm/kvm_main.c                           |   4 +
> >>  6 files changed, 163 insertions(+)
> >>  create mode 100644 Documentation/virtual/kvm/devices/arm-pmu.txt
> >>
> >> diff --git a/Documentation/virtual/kvm/devices/arm-pmu.txt b/Documentation/virtual/kvm/devices/arm-pmu.txt
> >> new file mode 100644
> >> index 0000000..dda864e
> >> --- /dev/null
> >> +++ b/Documentation/virtual/kvm/devices/arm-pmu.txt
> >> @@ -0,0 +1,24 @@
> >> +ARM Virtual Performance Monitor Unit (vPMU)
> >> +===========================================
> >> +
> >> +Device types supported:
> >> +  KVM_DEV_TYPE_ARM_PMU_V3         ARM Performance Monitor Unit v3
> >> +
> >> +Instantiate one PMU instance for per VCPU through this API.
> >                                 ^^ don't need this 'for' here
> > 
> >> +
> >> +Groups:
> >> +  KVM_DEV_ARM_PMU_GRP_IRQ
> >> +  Attributes:
> >> +    The attr field of kvm_device_attr encodes one value:
> >> +    bits:     | 63 .... 32 | 31 ....  0 |
> >> +    values:   |  reserved  | vcpu_index |
> > 
> > Everywhere else in kvm documentation a vcpu_index is 8 bits. I'm not
> > saying that that's good, but expanding it to 32 bits here is
> > inconsistent. 
> Expand it to 32 bits just in case it needs to support more than 256
> vcpus in the future. If you think this is not good, I'll change it to 8
> bits.

I agree that eventually we'll want more than 256 vcpus. However the kvm
api already has the concept of a vcpu-index, which shows up in two other
places, KVM_IRQ_LINE and KVM_DEV_ARM_VGIC_*. In both those other places
it's only 8 bits. While there may not be a KVM_VCPU_INDEX_MASK=0xff
somewhere, I still think we should keep things consistent.

Hmm, or each device should define its own index. Actually, that sounds
better, as each device may have a different max-vcpus limit; KVM_IRQ_LINE
has 8 bits due to the x86 apic supporting 8 bits, gicv2 could have gotten
by with only 3 bits, and gicv3 can use much, much more. When we want more
than 256 vcpus we'll need a new kvm-irq-line ioctl, but, as for the arm
devices, we could avoid the problem completely by extending the
SET/GET_DEVICE_ATTR ioctl to also be a vcpu ioctl.

> 
> (A side note is that I think we should s/vcpu_index/vcpu_id/
> > in order to keep the parameter name consistent with what's used by
> > KVM_CREATE_VCPU)
> > 
> Eh, I make it consistent with vGIC which is a kvm device that's also
> created via KVM_CREATE_DEVICE API. So they are different names but
> express same thing.

I knew you are consistent with arm-vgic.txt, and also with KVM_IRQ_LINE.
I was just thinking that it'd be nice for everything to be consistent
with KVM_CREATE_VCPU. However, now I'm of the mind that the vcpu-id
(32 bits) should always be independent of device vcpu-indexes, as we
don't want to limit the number of vcpus (by limiting vcpu-ids) to the
maximum of the most-limited device we support.

> 
> >> +    A value describing the PMU overflow interrupt number for the specified
> >> +    vcpu_index vcpu. This interrupt could be a PPI or SPI, but for one VM the
> >> +    interrupt type must be same for each vcpu. As a PPI, the interrupt number is
> >> +    same for all vcpus, while as a SPI it must be different for each vcpu.
> >> +
> >> +  Errors:
> >> +    -ENXIO: Unsupported attribute group
> > 
> > Do we need to specify ENXIO here? It's already specified in
> > Documentation/virtual/kvm/api.txt for SET/GET_DEVICE_ATTR
> > 
> But specifying it here is not bad, right? If you insist on this, I'll
> drop it.

It's fine, as the specification in SET/GET_DEVICE_ATTR section isn't
going to change, but we require users of this api to read that section
already, so I feel it's redundant and prefer to avoid redundancy. I won't
insist.

> 
> >> +    -EBUSY: The PMU overflow interrupt is already set
> >> +    -ENODEV: Getting the PMU overflow interrupt number while it's not set
> >> +    -EINVAL: Invalid vcpu_index or PMU overflow interrupt number supplied
> >> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> >> index 2d4ca4b..cbb9022 100644
> >> --- a/arch/arm64/include/uapi/asm/kvm.h
> >> +++ b/arch/arm64/include/uapi/asm/kvm.h
> >> @@ -204,6 +204,10 @@ struct kvm_arch_memory_slot {
> >>  #define KVM_DEV_ARM_VGIC_GRP_CTRL	4
> >>  #define   KVM_DEV_ARM_VGIC_CTRL_INIT	0
> >>  
> >> +/* Device Control API: ARM PMU */
> >> +#define KVM_DEV_ARM_PMU_GRP_IRQ		0
> >> +#define   KVM_DEV_ARM_PMU_CPUID_MASK	0xffffffffULL
> > 
> > I don't think we should need a mask like this, at least not in the uapi.
> > The documentation says to use a vcpu-index [really a vcpu-id], and users
> > should know how to convert their vcpu handle (whatever that may be) to
> > a vcpu-id using whatever transformation is necessary. They may already
> > have a "vcpu id mask" defined (this is one reason why I don't think we
> > should use a 32-bit vcpu-index here, instead of the 8-bit vcpu-index used
> > everywhere else). Likewise, kvm should know how to do it's transformation.
> > Maybe, to aid in the attr field construction, we should supply a shift,
> > allowing both sides to do something like
> > 
> > int pmu_attr_extract_vcpu_id(u64 attr)
> > {
> >     return (attr >> KVM_DEV_ARM_PMU_VCPUID_SHIFT) & VCPU_ID_MASK;
> So what's the value of the VCPU_ID_MASK? I didn't see any definitions
> about it. It looks like KVM_DEV_ARM_PMU_CPUID_MASK(just has a difference
> between 32 bits and 8 bits)

Whether or not a user would define a VCPU_ID_MASK == 0xff depends on
whether or not they understand 'vcpu-index' to be a consistent concept
everywhere they see it in the documentation. I now would prefer
vcpu-index being device specific. Going that way, then I think the
documentation could be updated to help make that clearer, e.g. calling
the gic vcpu-index gic_vcpu_index or something.

> 
> > }
> > 
> > u64 pmu_attr_create(int vcpu_id)
> > {
> >     return vcpu_id << KVM_DEV_ARM_PMU_VCPUID_SHIFT;
> > }
> > 
> > But, in this case the shift is zero, so it's not really necessary. In
> > any case, please add the 'V' for VCPU.
> > 
> >> +
> >>  /* KVM_IRQ_LINE irq field index values */
> >>  #define KVM_ARM_IRQ_TYPE_SHIFT		24
> >>  #define KVM_ARM_IRQ_TYPE_MASK		0xff
> >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >> index c923350..608dea6 100644
> >> --- a/include/linux/kvm_host.h
> >> +++ b/include/linux/kvm_host.h
> >> @@ -1161,6 +1161,7 @@ extern struct kvm_device_ops kvm_mpic_ops;
> >>  extern struct kvm_device_ops kvm_xics_ops;
> >>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> >>  extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
> >> +extern struct kvm_device_ops kvm_arm_pmu_ops;
> >>  
> >>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> >>  
> >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >> index 03f3618..4ba6fdd 100644
> >> --- a/include/uapi/linux/kvm.h
> >> +++ b/include/uapi/linux/kvm.h
> >> @@ -1032,6 +1032,8 @@ enum kvm_device_type {
> >>  #define KVM_DEV_TYPE_FLIC		KVM_DEV_TYPE_FLIC
> >>  	KVM_DEV_TYPE_ARM_VGIC_V3,
> >>  #define KVM_DEV_TYPE_ARM_VGIC_V3	KVM_DEV_TYPE_ARM_VGIC_V3
> >> +	KVM_DEV_TYPE_ARM_PMU_V3,
> >> +#define	KVM_DEV_TYPE_ARM_PMU_V3		KVM_DEV_TYPE_ARM_PMU_V3
> >>  	KVM_DEV_TYPE_MAX,
> >>  };
> >>  
> >> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> >> index 3ec3cdd..5518308 100644
> >> --- a/virt/kvm/arm/pmu.c
> >> +++ b/virt/kvm/arm/pmu.c
> >> @@ -19,6 +19,7 @@
> >>  #include <linux/kvm.h>
> >>  #include <linux/kvm_host.h>
> >>  #include <linux/perf_event.h>
> >> +#include <linux/uaccess.h>
> >>  #include <asm/kvm_emulate.h>
> >>  #include <kvm/arm_pmu.h>
> >>  #include <kvm/arm_vgic.h>
> >> @@ -374,3 +375,130 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
> >>  
> >>  	pmc->perf_event = event;
> >>  }
> >> +
> >> +static inline bool kvm_arm_pmu_initialized(struct kvm_vcpu *vcpu)
> >> +{
> >> +	return vcpu->arch.pmu.irq_num != -1;
> >> +}
> >> +
> >> +static int kvm_arm_pmu_irq_access(struct kvm *kvm, struct kvm_device_attr *attr,
> >> +				  int *irq, bool is_set)
> >> +{
> >> +	int cpuid;
> > 
> > please call this vcpu_id
> > 
> >> +	struct kvm_vcpu *vcpu;
> >> +	struct kvm_pmu *pmu;
> >> +
> >> +	cpuid = attr->attr & KVM_DEV_ARM_PMU_CPUID_MASK;
> >> +	if (cpuid >= atomic_read(&kvm->online_vcpus))
> >> +		return -EINVAL;
> >> +
> >> +	vcpu = kvm_get_vcpu(kvm, cpuid);
> >> +	if (!vcpu)
> >> +		return -EINVAL;
> >> +
> >> +	pmu = &vcpu->arch.pmu;
> >> +	if (!is_set) {
> >> +		if (!kvm_arm_pmu_initialized(vcpu))
> >> +			return -ENODEV;
> >> +
> >> +		*irq = pmu->irq_num;
> >> +	} else {
> >> +		if (kvm_arm_pmu_initialized(vcpu))
> >> +			return -EBUSY;
> >> +
> >> +		kvm_debug("Set kvm ARM PMU irq: %d\n", *irq);
> >> +		pmu->irq_num = *irq;
> >> +	}
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +static int kvm_arm_pmu_create(struct kvm_device *dev, u32 type)
> >> +{
> >> +	int i;
> >> +	struct kvm_vcpu *vcpu;
> >> +	struct kvm *kvm = dev->kvm;
> >> +
> >> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> >> +		struct kvm_pmu *pmu = &vcpu->arch.pmu;
> >> +
> >> +		memset(pmu, 0, sizeof(*pmu));
> > 
> > I don't think we want this memset. If we can only create the pmu
> > once, then it's unnecessary (we zalloc vcpus). And, if we can
> > recreate pmus with this call, then it'll create a memory leak, as
> > we'll be zero-ing out all the perf_event pointers, and then won't
> > be able to free them on the call to kvm_pmu_vcpu_reset. Naturally
> > we need to make sure we're NULL-ing them after each free instead.
> > 
> Ok, will drop this.
> 
> >> +		kvm_pmu_vcpu_reset(vcpu);
> >> +		pmu->irq_num = -1;
> >> +	}
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +static void kvm_arm_pmu_destroy(struct kvm_device *dev)
> >> +{
> >> +	kfree(dev);
> >> +}
> >> +
> >> +static int kvm_arm_pmu_set_attr(struct kvm_device *dev,
> >> +				struct kvm_device_attr *attr)
> >> +{
> >> +	switch (attr->group) {
> >> +	case KVM_DEV_ARM_PMU_GRP_IRQ: {
> >> +		int __user *uaddr = (int __user *)(long)attr->addr;
> >> +		int reg;
> >> +
> >> +		if (get_user(reg, uaddr))
> >> +			return -EFAULT;
> >> +
> >> +		/*
> >> +		 * The PMU overflow interrupt could be a PPI or SPI, but for one
> >> +		 * VM the interrupt type must be same for each vcpu. As a PPI,
> >> +		 * the interrupt number is same for all vcpus, while as a SPI it
> >> +		 * must be different for each vcpu.
> >> +		 */
> >> +		if (reg < VGIC_NR_SGIS || reg >= dev->kvm->arch.vgic.nr_irqs)
> >> +			return -EINVAL;
> >> +
> >> +		return kvm_arm_pmu_irq_access(dev->kvm, attr, &reg, true);
> >> +	}
> >> +	}
> >> +
> >> +	return -ENXIO;
> >> +}
> >> +
> >> +static int kvm_arm_pmu_get_attr(struct kvm_device *dev,
> >> +				struct kvm_device_attr *attr)
> >> +{
> >> +	int ret;
> >> +
> >> +	switch (attr->group) {
> >> +	case KVM_DEV_ARM_PMU_GRP_IRQ: {
> >> +		int __user *uaddr = (int __user *)(long)attr->addr;
> >> +		int reg = -1;
> >> +
> >> +
> >> +		ret = kvm_arm_pmu_irq_access(dev->kvm, attr, &reg, false);
> >> +		if (ret)
> >> +			return ret;
> >> +		return put_user(reg, uaddr);
> >> +	}
> >> +	}
> >> +
> >> +	return -ENXIO;
> >> +}
> >> +
> >> +static int kvm_arm_pmu_has_attr(struct kvm_device *dev,
> >> +				struct kvm_device_attr *attr)
> >> +{
> >> +	switch (attr->group) {
> >> +	case KVM_DEV_ARM_PMU_GRP_IRQ:
> >> +		return 0;
> >> +	}
> >> +
> >> +	return -ENXIO;
> >> +}
> >> +
> >> +struct kvm_device_ops kvm_arm_pmu_ops = {
> >> +	.name = "kvm-arm-pmu",
> >> +	.create = kvm_arm_pmu_create,
> >> +	.destroy = kvm_arm_pmu_destroy,
> >> +	.set_attr = kvm_arm_pmu_set_attr,
> >> +	.get_attr = kvm_arm_pmu_get_attr,
> >> +	.has_attr = kvm_arm_pmu_has_attr,
> >> +};
> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >> index 484079e..81a42cc 100644
> >> --- a/virt/kvm/kvm_main.c
> >> +++ b/virt/kvm/kvm_main.c
> >> @@ -2647,6 +2647,10 @@ static struct kvm_device_ops *kvm_device_ops_table[KVM_DEV_TYPE_MAX] = {
> >>  #ifdef CONFIG_KVM_XICS
> >>  	[KVM_DEV_TYPE_XICS]		= &kvm_xics_ops,
> >>  #endif
> >> +
> >> +#ifdef CONFIG_KVM_ARM_PMU
> >> +	[KVM_DEV_TYPE_ARM_PMU_V3]	= &kvm_arm_pmu_ops,
> > 
> > Shouldn't we specify 'v3' in the kvm_arm_pmu_ops name, as we do with the
> > device type name?
> > 
> Sure, will add.
> 
> >> +#endif
> >>  };
> >>  
> >>  int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type)
> >> -- 
> >> 2.0.4
> >>

Thanks,
drew