[PATCH v2 1/9] KVM: arm64: Allow creating the PMU without the in-kernel GIC

Wed May 24 09:37:03 PDT 2017

On 24/05/17 12:45, Christoffer Dall wrote:
> On Wed, May 24, 2017 at 10:16:56AM +0100, Marc Zyngier wrote:
>> On 24/05/17 09:38, Christoffer Dall wrote:
>>> On Tue, May 23, 2017 at 05:52:01PM +0100, Marc Zyngier wrote:
>>>> On 16/05/17 19:45, Christoffer Dall wrote:
>>>>> Since we got support for devices in userspace which allows reporting the
>>>>> PMU overflow output status to userspace, we should actually allow
>>>>> creating the PMU on systems without an in-kernel irqchip, which in turn
>>>>> requires us to slightly clarify error codes for the ABI and move things
>>>>> around for the initialization phase.
>>>>>
>>>>> Signed-off-by: Christoffer Dall <cdall at linaro.org>
>>>>> ---
>>>>>  Documentation/virtual/kvm/devices/vcpu.txt | 16 +++++++++-------
>>>>>  virt/kvm/arm/pmu.c                         | 30 ++++++++++++++++++++----------
>>>>>  2 files changed, 29 insertions(+), 17 deletions(-)
>>>>>
>>>>> diff --git a/Documentation/virtual/kvm/devices/vcpu.txt b/Documentation/virtual/kvm/devices/vcpu.txt
>>>>> index 02f5068..352af6e 100644
>>>>> --- a/Documentation/virtual/kvm/devices/vcpu.txt
>>>>> +++ b/Documentation/virtual/kvm/devices/vcpu.txt
>>>>> @@ -16,7 +16,9 @@ Parameters: in kvm_device_attr.addr the address for PMU overflow interrupt is a
>>>>>  Returns: -EBUSY: The PMU overflow interrupt is already set
>>>>>           -ENXIO: The overflow interrupt not set when attempting to get it
>>>>>           -ENODEV: PMUv3 not supported
>>>>> -         -EINVAL: Invalid PMU overflow interrupt number supplied
>>>>> +         -EINVAL: Invalid PMU overflow interrupt number supplied or
>>>>> +                  trying to set the IRQ number without using an in-kernel
>>>>> +                  irqchip.
>>>>>  
>>>>>  A value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt
>>>>>  number for this vcpu. This interrupt could be a PPI or SPI, but the interrupt
>>>>> @@ -25,11 +27,11 @@ all vcpus, while as an SPI it must be a separate number per vcpu.
>>>>>  
>>>>>  1.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT
>>>>>  Parameters: no additional parameter in kvm_device_attr.addr
>>>>> -Returns: -ENODEV: PMUv3 not supported
>>>>> -         -ENXIO: PMUv3 not properly configured as required prior to calling this
>>>>> -                 attribute
>>>>> +Returns: -ENODEV: PMUv3 not supported or GIC not initialized
>>>>> +         -ENXIO: PMUv3 not properly configured or in-kernel irqchip not
>>>>> +                 conigured as required prior to calling this attribute
>>>>
>>>>                     configured
>>>>
>>>>>           -EBUSY: PMUv3 already initialized
>>>>>  
>>>>> -Request the initialization of the PMUv3.  This must be done after creating the
>>>>> -in-kernel irqchip.  Creating a PMU with a userspace irqchip is currently not
>>>>> -supported.
>>>>> +Request the initialization of the PMUv3.  If using the PMUv3 with an in-kernel
>>>>> +virtual GIC implementation, this must be done after initializing the in-kernel
>>>>> +irqchip.
>>>>> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
>>>>> index 4b43e7f..7209185 100644
>>>>> --- a/virt/kvm/arm/pmu.c
>>>>> +++ b/virt/kvm/arm/pmu.c
>>>>> @@ -456,21 +456,25 @@ static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu)
>>>>>  	if (!kvm_arm_support_pmu_v3())
>>>>>  		return -ENODEV;
>>>>>  
>>>>> -	/*
>>>>> -	 * We currently require an in-kernel VGIC to use the PMU emulation,
>>>>> -	 * because we do not support forwarding PMU overflow interrupts to
>>>>> -	 * userspace yet.
>>>>> -	 */
>>>>> -	if (!irqchip_in_kernel(vcpu->kvm) || !vgic_initialized(vcpu->kvm))
>>>>> -		return -ENODEV;
>>>>> -
>>>>> -	if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features) ||
>>>>> -	    !kvm_arm_pmu_irq_initialized(vcpu))
>>>>> +	if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
>>>>>  		return -ENXIO;
>>>>>  
>>>>>  	if (kvm_arm_pmu_v3_ready(vcpu))
>>>>>  		return -EBUSY;
>>>>>  
>>>>> +	if (irqchip_in_kernel(vcpu->kvm)) {
>>>>> +		/*
>>>>> +		 * If using the PMU with an in-kernel virtual GIC
>>>>> +		 * implementation, we require the GIC to be already
>>>>> +		 * initialized when initializing the PMU.
>>>>> +		 */
>>>>> +		if (!vgic_initialized(vcpu->kvm))
>>>>> +			return -ENODEV;
>>>>> +
>>>>> +		if (!kvm_arm_pmu_irq_initialized(vcpu))
>>>>> +			return -ENXIO;
>>>>> +	}
>>>>> +
>>>>
>>>> Do we also need to prevent a vgic to be created if the PMU has been
>>>> initialized beforehand?
>>>>
>>>
>>> Sigh.  We probably have to.
>>>
>>> I don't like having a cross-VGIC-PMU check, but we could do something
>>> like setting a flag on the kvm struct so that irqchip_in_user() always
>>> return true, and if that is set, it is not possible to create the VGIC.
>>>
>>> Alternatively we can make the PMU init a no-op, and try to enable it via
>>> the first-vcpu-run path, like the timer, and check that everything lines
>>> up then (i.e. you have in-kernel irqchip with a non-conflicting irq
>>> number or you have a userspace irqchip).
>>
>> I like this second solution better, as it gives us a common approach to
>> similar problems.
> 
> Yes, but on the other hand it's a bit like the lazy init stuff, where we
> defer things to happen at some point in time, and the whole PMU init
> thing was to avoid that.  On the first hand, it seems to work well for
> lots of things, so why not...

The main problem is that we now have lots of things that can be
configured in arbitrary orders. We can either check the consistency of
the current configuration at each step, or delay it until we are ready
to run, and then check everything in one go.

Overall, this is the same set of checks. We just need to decide which
way we want to go, and stick with it. Which is of course easier said
than done... ;-)

> 
>> Would that also help with not having to implement the
>> allocator you introduced in patches 6-9 (which I haven't reviewed yet)?
>>
> 
> Not really.  I mean, we could add some cross calls between the timer and
> PMU code, but I think that's fairly disgusting, and it doesn't prevent
> userspace from fiddling with an interrupt signal used in the kernel
> anyway.
> 
> With patchs 6-9, I feel like we should take one of two overall stands;
> either we don't care that userspace can wire things up share an
> interrupt line, even with in-kernel driven devices, as it would be the
> equivalent of putting an OR gate before the GIC in hardware (not that I
> recommend anyone doing that), or we should implement the full story and
> prevent userspace from ever shooting itself in the foot.

Having had a quick look at 6-9, I'm hesitant (again). On one hand, the
each patch is fairly simple and not very invasive, but on the other
hand, the result is a tiny bit ugly (all these "owner" parameters passed
around).

So the real question is still: does anything break on the host if
userspace configures something in a nonsensical way? My main concern is
the effect of having the virtual PMU sharing a line with the virtual
timer, which has the HW bit set. So let's assume that for a minute:

- PMU interrupt is asserted
- Timer interrupt is not asserted
- We inject the shared intid with the HW bit set, and the hwintid of the
virtual timer.
- We do *not* set the active bit on the redistributor (because the timer
code has noticed we don't need to)

When the guest does EOI the interrupt, it will propagate into the
physical distributor, and deactivate... I don't know what, since we
don't have an active interrupt there.

Things become even uglier if the timer fires between the activation of
the virtual interrupt and its EOI. The PMU interrupt EOI ends up
deactivating the timer on the physical distributor, leading to the timer
firing again... At that stage, I think things get pretty bad, as we've
lost track of the state.

That being said, I don't think we break anything on the host, but the
above seem disturbing enough that we don't want to let it happen. So I
guess that patches 6-9 are actually required. I guess I'll finish
reviewing this tomorrow, as I'm already at the beer festival... ;-)

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...