[PATCH 1/5] KVM: arm64: Allow creating the PMU without the in-kernel GIC
Marc Zyngier
marc.zyngier at arm.com
Thu May 4 03:16:57 PDT 2017
On 04/05/17 10:44, Christoffer Dall wrote:
> On Thu, May 04, 2017 at 10:05:43AM +0100, Marc Zyngier wrote:
>> On 04/05/17 09:38, Christoffer Dall wrote:
>>> On Thu, May 04, 2017 at 09:28:50AM +0100, Marc Zyngier wrote:
>>>> On 04/05/17 09:13, Christoffer Dall wrote:
>>>>> On Thu, May 04, 2017 at 09:09:47AM +0100, Marc Zyngier wrote:
>>>>>> On 03/05/17 19:32, Christoffer Dall wrote:
>>>>>>> Since we got support for devices in userspace which allows reporting the
>>>>>>> PMU overflow output status to userspace, we should actually allow
>>>>>>> creating the PMU on systems without an in-kernel irqchip, which in turn
>>>>>>> requires us to slightly clarify error codes for the ABI and move things
>>>>>>> around for the initialization phase.
>>>>>>>
>>>>>>> Signed-off-by: Christoffer Dall <cdall at linaro.org>
>>>>>>> ---
>>>>>>> Documentation/virtual/kvm/devices/vcpu.txt | 16 +++++++++-------
>>>>>>> virt/kvm/arm/pmu.c | 27 +++++++++++++++++----------
>>>>>>> 2 files changed, 26 insertions(+), 17 deletions(-)
>>>>>>>
>>>>>>> diff --git a/Documentation/virtual/kvm/devices/vcpu.txt b/Documentation/virtual/kvm/devices/vcpu.txt
>>>>>>> index 02f5068..352af6e 100644
>>>>>>> --- a/Documentation/virtual/kvm/devices/vcpu.txt
>>>>>>> +++ b/Documentation/virtual/kvm/devices/vcpu.txt
>>>>>>> @@ -16,7 +16,9 @@ Parameters: in kvm_device_attr.addr the address for PMU overflow interrupt is a
>>>>>>> Returns: -EBUSY: The PMU overflow interrupt is already set
>>>>>>> -ENXIO: The overflow interrupt not set when attempting to get it
>>>>>>> -ENODEV: PMUv3 not supported
>>>>>>> - -EINVAL: Invalid PMU overflow interrupt number supplied
>>>>>>> + -EINVAL: Invalid PMU overflow interrupt number supplied or
>>>>>>> + trying to set the IRQ number without using an in-kernel
>>>>>>> + irqchip.
>>>>>>>
>>>>>>> A value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt
>>>>>>> number for this vcpu. This interrupt could be a PPI or SPI, but the interrupt
>>>>>>> @@ -25,11 +27,11 @@ all vcpus, while as an SPI it must be a separate number per vcpu.
>>>>>>>
>>>>>>> 1.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT
>>>>>>> Parameters: no additional parameter in kvm_device_attr.addr
>>>>>>> -Returns: -ENODEV: PMUv3 not supported
>>>>>>> - -ENXIO: PMUv3 not properly configured as required prior to calling this
>>>>>>> - attribute
>>>>>>> +Returns: -ENODEV: PMUv3 not supported or GIC not initialized
>>>>>>> + -ENXIO: PMUv3 not properly configured or in-kernel irqchip not
>>>>>>> + conigured as required prior to calling this attribute
>>>>>>> -EBUSY: PMUv3 already initialized
>>>>>>>
>>>>>>> -Request the initialization of the PMUv3. This must be done after creating the
>>>>>>> -in-kernel irqchip. Creating a PMU with a userspace irqchip is currently not
>>>>>>> -supported.
>>>>>>> +Request the initialization of the PMUv3. If using the PMUv3 with an in-kernel
>>>>>>> +virtual GIC implementation, this must be done after initializing the in-kernel
>>>>>>> +irqchip.
>>>>>>> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
>>>>>>> index 4b43e7f..f046b08 100644
>>>>>>> --- a/virt/kvm/arm/pmu.c
>>>>>>> +++ b/virt/kvm/arm/pmu.c
>>>>>>> @@ -456,21 +456,25 @@ static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu)
>>>>>>> if (!kvm_arm_support_pmu_v3())
>>>>>>> return -ENODEV;
>>>>>>>
>>>>>>> - /*
>>>>>>> - * We currently require an in-kernel VGIC to use the PMU emulation,
>>>>>>> - * because we do not support forwarding PMU overflow interrupts to
>>>>>>> - * userspace yet.
>>>>>>> - */
>>>>>>> - if (!irqchip_in_kernel(vcpu->kvm) || !vgic_initialized(vcpu->kvm))
>>>>>>> - return -ENODEV;
>>>>>>> -
>>>>>>> - if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features) ||
>>>>>>> - !kvm_arm_pmu_irq_initialized(vcpu))
>>>>>>> + if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
>>>>>>> return -ENXIO;
>>>>>>>
>>>>>>> if (kvm_arm_pmu_v3_ready(vcpu))
>>>>>>> return -EBUSY;
>>>>>>>
>>>>>>> + if (irqchip_in_kernel(vcpu->kvm)) {
>>>>>>> + /*
>>>>>>> + * If using the PMU with an in-kernel virtual GIC
>>>>>>> + * implementation, we require the GIC to be already
>>>>>>> + * initialized when initializing the PMU.
>>>>>>> + */
>>>>>>> + if (!vgic_initialized(vcpu->kvm))
>>>>>>> + return -ENODEV;
>>>>>>> +
>>>>>>> + if (!kvm_arm_pmu_irq_initialized(vcpu))
>>>>>>> + return -ENXIO;
>>>>>>> + }
>>>>>>> +
>>>>>>> kvm_pmu_vcpu_reset(vcpu);
>>>>>>> vcpu->arch.pmu.ready = true;
>>>>>>>
>>>>>>> @@ -512,6 +516,9 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>>>>>>> int __user *uaddr = (int __user *)(long)attr->addr;
>>>>>>> int irq;
>>>>>>>
>>>>>>> + if (!irqchip_in_kernel(vcpu->kvm))
>>>>>>> + return -EINVAL;
>>>>>>> +
>>>>>>
>>>>>> Shouldn't we fail the same way for {get,has}_attr? get_attr is going to
>>>>>> generate a -ENXIO, and has_attr is going to lie about the availability
>>>>>> of KVM_ARM_VCPU_PMU_V3_IRQ...
>>>>>>
>>>>>
>>>>> Here's the text from api.txt:
>>>>>
>>>>> Tests whether a device supports a particular attribute. A successful
>>>>> return indicates the attribute is implemented. It does not necessarily
>>>>> indicate that the attribute can be read or written in the device's
>>>>> current state. "addr" is ignored.
>>>>>
>>>>> My interpretation therefore is that QEMU can use this ioctl to figure
>>>>> out if the feature is supported (sort of like a capability), but that
>>>>> doesn't mean that the configuration of the VM is such that the attribute
>>>>> can be get or set at that moment.
>>>>>
>>>>> For example, there will also alway be situations where you can get an
>>>>> attr, but not set an attr, what should the has_attr return then?
>>>>
>>>> My issue here is that whether we can get/set the interrupt or not is not
>>>> a function of the device itself, but of the way it is "wired". No matter
>>>> what "the device's current state" is, we'll never be able to get/set the
>>>> interrupt.
>>>>
>>>> I'd tend to err on the side of caution and return something that is
>>>> unambiguous, be maybe I have too strict an interpretation of the API.
>>>>
>>>
>>> Hmm, I see the has_attr as a method for userspace to discover "does this
>>> kernel have this feature". If we make has_attr return a value depending
>>> on the VM having an in-kernel gic or not, we implicitly require
>>> userspace to create a VM with an in-kernel GIC to discover if this
>>> kernel has that feature, and therefore also impose an ordering
>>> requirement of figuring out the capabilities of the kernel (i.e. create
>>> the GIC before checking this API).
>>>
>>> I think QEMU should be able to do:
>>>
>>> if (has_attr()) {
>>> kvm_supports_set_timer_irq = true;
>>> vtimer_irq = foo;
>>> } else {
>>> kvm_supports_set_timer_irq = false;
>>> vtimer_irq = 27; /* default, we're stuck with it */
>>> }
>>>
>>> create_board_definition();
>>> create_dt();
>>> create_acpi();
>>>
>>> /* do whatever */
>>>
>>> if (kvm_supports_set_timer_irq && kvm_irqchip_in_kernel()) {
>>> kvm_arm_timer_set_irq(...);
>>> }
>>>
>>> And all this should not be coupled to when we create the irqchip device.
>>>
>>> But I may be failing to see the case where the current implementation
>>> creates a problem for userspace, in which case we should document the
>>> ordering requirement.
>>
>> I'm not sure it would create any problem, at least not for the PMU
>> (there is no working code that would have created a PMU without an irqchip).
>>
>> It is just that we have a slightly diverging interpretation of what
>> has_attr means. You see it as "attribute that the device supports", and
>> I see it as "attribute that the device supports in this configuration".
>> I'm happy to use your semantics from now on.
>
> In either case, we should make sure this is clear in the ABI. I thought
> that the "It does not necessarily indicate that the attribute can be
> read or written in the device's current state." implied my
> interpretation, but maybe I'm missing some subtlety there?
Well, that's what I said above. The interrupt number is not a function
of the device state, but one of the integration of that device in the
system. The PMU itself is not concerned with an interrupt number, only
the GIC is.
> Do you think we should clarify the API?
Not sure. I admit my interpretation is a bit borderline. If I was brave,
I'd say that all the interrupt numbers should be a property of the GIC,
and not of the device. But that ship has sailed a while ago, and I'm
weak ;-).
More seriously, I don't think this is likely to cause any issue.
>
> By the way, I now realize that we are not maintining the same
> understanding between get/set_attr, which I really think we should, so
> I'll add the following:
>
>
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index 9b3e3ea..0cf62b7 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -551,6 +551,9 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
> int __user *uaddr = (int __user *)(long)attr->addr;
> int irq;
>
> + if (!irqchip_in_kernel(vcpu->kvm))
> + return -EINVAL;
> +
> if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
> return -ENODEV;
>
Looks good to me.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
More information about the linux-arm-kernel
mailing list