[PATCH 1/5] KVM: arm64: Allow creating the PMU without the in-kernel GIC

Christoffer Dall cdall at linaro.org
Thu May 4 03:38:52 PDT 2017


On Thu, May 04, 2017 at 11:16:57AM +0100, Marc Zyngier wrote:
> On 04/05/17 10:44, Christoffer Dall wrote:
> > On Thu, May 04, 2017 at 10:05:43AM +0100, Marc Zyngier wrote:
> >> On 04/05/17 09:38, Christoffer Dall wrote:
> >>> On Thu, May 04, 2017 at 09:28:50AM +0100, Marc Zyngier wrote:
> >>>> On 04/05/17 09:13, Christoffer Dall wrote:
> >>>>> On Thu, May 04, 2017 at 09:09:47AM +0100, Marc Zyngier wrote:
> >>>>>> On 03/05/17 19:32, Christoffer Dall wrote:
> >>>>>>> Since we got support for devices in userspace which allows reporting the
> >>>>>>> PMU overflow output status to userspace, we should actually allow
> >>>>>>> creating the PMU on systems without an in-kernel irqchip, which in turn
> >>>>>>> requires us to slightly clarify error codes for the ABI and move things
> >>>>>>> around for the initialization phase.
> >>>>>>>
> >>>>>>> Signed-off-by: Christoffer Dall <cdall at linaro.org>
> >>>>>>> ---
> >>>>>>>  Documentation/virtual/kvm/devices/vcpu.txt | 16 +++++++++-------
> >>>>>>>  virt/kvm/arm/pmu.c                         | 27 +++++++++++++++++----------
> >>>>>>>  2 files changed, 26 insertions(+), 17 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/Documentation/virtual/kvm/devices/vcpu.txt b/Documentation/virtual/kvm/devices/vcpu.txt
> >>>>>>> index 02f5068..352af6e 100644
> >>>>>>> --- a/Documentation/virtual/kvm/devices/vcpu.txt
> >>>>>>> +++ b/Documentation/virtual/kvm/devices/vcpu.txt
> >>>>>>> @@ -16,7 +16,9 @@ Parameters: in kvm_device_attr.addr the address for PMU overflow interrupt is a
> >>>>>>>  Returns: -EBUSY: The PMU overflow interrupt is already set
> >>>>>>>           -ENXIO: The overflow interrupt not set when attempting to get it
> >>>>>>>           -ENODEV: PMUv3 not supported
> >>>>>>> -         -EINVAL: Invalid PMU overflow interrupt number supplied
> >>>>>>> +         -EINVAL: Invalid PMU overflow interrupt number supplied or
> >>>>>>> +                  trying to set the IRQ number without using an in-kernel
> >>>>>>> +                  irqchip.
> >>>>>>>  
> >>>>>>>  A value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt
> >>>>>>>  number for this vcpu. This interrupt could be a PPI or SPI, but the interrupt
> >>>>>>> @@ -25,11 +27,11 @@ all vcpus, while as an SPI it must be a separate number per vcpu.
> >>>>>>>  
> >>>>>>>  1.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT
> >>>>>>>  Parameters: no additional parameter in kvm_device_attr.addr
> >>>>>>> -Returns: -ENODEV: PMUv3 not supported
> >>>>>>> -         -ENXIO: PMUv3 not properly configured as required prior to calling this
> >>>>>>> -                 attribute
> >>>>>>> +Returns: -ENODEV: PMUv3 not supported or GIC not initialized
> >>>>>>> +         -ENXIO: PMUv3 not properly configured or in-kernel irqchip not
> >>>>>>> +                 conigured as required prior to calling this attribute
> >>>>>>>           -EBUSY: PMUv3 already initialized
> >>>>>>>  
> >>>>>>> -Request the initialization of the PMUv3.  This must be done after creating the
> >>>>>>> -in-kernel irqchip.  Creating a PMU with a userspace irqchip is currently not
> >>>>>>> -supported.
> >>>>>>> +Request the initialization of the PMUv3.  If using the PMUv3 with an in-kernel
> >>>>>>> +virtual GIC implementation, this must be done after initializing the in-kernel
> >>>>>>> +irqchip.
> >>>>>>> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> >>>>>>> index 4b43e7f..f046b08 100644
> >>>>>>> --- a/virt/kvm/arm/pmu.c
> >>>>>>> +++ b/virt/kvm/arm/pmu.c
> >>>>>>> @@ -456,21 +456,25 @@ static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu)
> >>>>>>>  	if (!kvm_arm_support_pmu_v3())
> >>>>>>>  		return -ENODEV;
> >>>>>>>  
> >>>>>>> -	/*
> >>>>>>> -	 * We currently require an in-kernel VGIC to use the PMU emulation,
> >>>>>>> -	 * because we do not support forwarding PMU overflow interrupts to
> >>>>>>> -	 * userspace yet.
> >>>>>>> -	 */
> >>>>>>> -	if (!irqchip_in_kernel(vcpu->kvm) || !vgic_initialized(vcpu->kvm))
> >>>>>>> -		return -ENODEV;
> >>>>>>> -
> >>>>>>> -	if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features) ||
> >>>>>>> -	    !kvm_arm_pmu_irq_initialized(vcpu))
> >>>>>>> +	if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
> >>>>>>>  		return -ENXIO;
> >>>>>>>  
> >>>>>>>  	if (kvm_arm_pmu_v3_ready(vcpu))
> >>>>>>>  		return -EBUSY;
> >>>>>>>  
> >>>>>>> +	if (irqchip_in_kernel(vcpu->kvm)) {
> >>>>>>> +		/*
> >>>>>>> +		 * If using the PMU with an in-kernel virtual GIC
> >>>>>>> +		 * implementation, we require the GIC to be already
> >>>>>>> +		 * initialized when initializing the PMU.
> >>>>>>> +		 */
> >>>>>>> +		if (!vgic_initialized(vcpu->kvm))
> >>>>>>> +			return -ENODEV;
> >>>>>>> +
> >>>>>>> +		if (!kvm_arm_pmu_irq_initialized(vcpu))
> >>>>>>> +			return -ENXIO;
> >>>>>>> +	}
> >>>>>>> +
> >>>>>>>  	kvm_pmu_vcpu_reset(vcpu);
> >>>>>>>  	vcpu->arch.pmu.ready = true;
> >>>>>>>  
> >>>>>>> @@ -512,6 +516,9 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
> >>>>>>>  		int __user *uaddr = (int __user *)(long)attr->addr;
> >>>>>>>  		int irq;
> >>>>>>>  
> >>>>>>> +		if (!irqchip_in_kernel(vcpu->kvm))
> >>>>>>> +			return -EINVAL;
> >>>>>>> +
> >>>>>>
> >>>>>> Shouldn't we fail the same way for {get,has}_attr? get_attr is going to
> >>>>>> generate a -ENXIO, and has_attr is going to lie about the availability
> >>>>>> of KVM_ARM_VCPU_PMU_V3_IRQ...
> >>>>>>
> >>>>>
> >>>>> Here's the text from api.txt:
> >>>>>
> >>>>>   Tests whether a device supports a particular attribute.  A successful
> >>>>>   return indicates the attribute is implemented.  It does not necessarily
> >>>>>   indicate that the attribute can be read or written in the device's
> >>>>>   current state.  "addr" is ignored.
> >>>>>
> >>>>> My interpretation therefore is that QEMU can use this ioctl to figure
> >>>>> out if the feature is supported (sort of like a capability), but that
> >>>>> doesn't mean that the configuration of the VM is such that the attribute
> >>>>> can be get or set at that moment.
> >>>>>
> >>>>> For example, there will also alway be situations where you can get an
> >>>>> attr, but not set an attr, what should the has_attr return then?
> >>>>
> >>>> My issue here is that whether we can get/set the interrupt or not is not
> >>>> a function of the device itself, but of the way it is "wired". No matter
> >>>> what "the device's current state" is, we'll never be able to get/set the
> >>>> interrupt.
> >>>>
> >>>> I'd tend to err on the side of caution and return something that is
> >>>> unambiguous, be maybe I have too strict an interpretation of the API.
> >>>>
> >>>
> >>> Hmm, I see the has_attr as a method for userspace to discover "does this
> >>> kernel have this feature".  If we make has_attr return a value depending
> >>> on the VM having an in-kernel gic or not, we implicitly require
> >>> userspace to create a VM with an in-kernel GIC to discover if this
> >>> kernel has that feature, and therefore also impose an ordering
> >>> requirement of figuring out the capabilities of the kernel (i.e. create
> >>> the GIC before checking this API).
> >>>
> >>> I think QEMU should be able to do:
> >>>
> >>>   if (has_attr()) {
> >>>      kvm_supports_set_timer_irq = true;
> >>>      vtimer_irq = foo;
> >>>   } else {
> >>>      kvm_supports_set_timer_irq = false;
> >>>      vtimer_irq = 27; /* default, we're stuck with it */
> >>>   }
> >>>
> >>>   create_board_definition();
> >>>   create_dt();
> >>>   create_acpi();
> >>>
> >>>   /* do whatever */
> >>>
> >>>   if (kvm_supports_set_timer_irq && kvm_irqchip_in_kernel()) {
> >>>   	kvm_arm_timer_set_irq(...);
> >>>   }
> >>>
> >>> And all this should not be coupled to when we create the irqchip device.
> >>>
> >>> But I may be failing to see the case where the current implementation
> >>> creates a problem for userspace, in which case we should document the
> >>> ordering requirement.
> >>
> >> I'm not sure it would create any problem, at least not for the PMU
> >> (there is no working code that would have created a PMU without an irqchip).
> >>
> >> It is just that we have a slightly diverging interpretation of what
> >> has_attr means. You see it as "attribute that the device supports", and
> >> I see it as "attribute that the device supports in this configuration".
> >> I'm happy to use your semantics from now on.
> > 
> > In either case, we should make sure this is clear in the ABI.  I thought
> > that the "It does not necessarily indicate that the attribute can be
> > read or written in the device's current state." implied my
> > interpretation, but maybe I'm missing some subtlety there?
> 
> Well, that's what I said above. The interrupt number is not a function
> of the device state, but one of the integration of that device in the
> system. The PMU itself is not concerned with an interrupt number, only
> the GIC is.
> 
> > Do you think we should clarify the API?
> 
> Not sure. I admit my interpretation is a bit borderline. If I was brave,
> I'd say that all the interrupt numbers should be a property of the GIC,
> and not of the device. But that ship has sailed a while ago, and I'm
> weak ;-).
> 
> More seriously, I don't think this is likely to cause any issue.
> 

ok, I'll leave it as is then.

> > 
> > By the way, I now realize that we are not maintining the same
> > understanding between get/set_attr, which I really think we should, so
> > I'll add the following:
> > 
> > 
> > diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> > index 9b3e3ea..0cf62b7 100644
> > --- a/virt/kvm/arm/pmu.c
> > +++ b/virt/kvm/arm/pmu.c
> > @@ -551,6 +551,9 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
> >  		int __user *uaddr = (int __user *)(long)attr->addr;
> >  		int irq;
> >  
> > +		if (!irqchip_in_kernel(vcpu->kvm))
> > +			return -EINVAL;
> > +
> >  		if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
> >  			return -ENODEV;
> >  
> 
> Looks good to me.
> 

Thanks,
-Christoffer



More information about the linux-arm-kernel mailing list