[PATCH 1/2] KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid

Thu Mar 8 02:19:49 PST 2018

On 07/03/18 23:34, Christoffer Dall wrote:
> On Wed, Mar 7, 2018 at 12:40 PM, Marc Zyngier <marc.zyngier at arm.com> wrote:
>> The vgic code is trying to be clever when injecting GICv2 SGIs,
>> and will happily populate LRs with the same interrupt number if
>> they come from multiple vcpus (after all, they are distinct
>> interrupt sources).
>>
>> Unfortunately, this is against the letter of the architecture,
>> and the GICv2 architecture spec says "Each valid interrupt stored
>> in the List registers must have a unique VirtualID for that
>> virtual CPU interface.". GICv3 has similar (although slightly
>> ambiguous) restrictions.
>>
>> This results in guests locking up when using GICv2-on-GICv3, for
>> example. The obvious fix is to stop trying so hard, and inject
>> a single vcpu per SGI per guest entry. After all, pending SGIs
>> with multiple source vcpus are pretty rare, and are mostly seen
>> in scenario where the physical CPUs are severely overcomitted.
>>
>> Cc: stable at vger.kernel.org
>> Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
>> Signed-off-by: Marc Zyngier <marc.zyngier at arm.com>
>> ---
>>  virt/kvm/arm/vgic/vgic.c | 11 +----------
>>  1 file changed, 1 insertion(+), 10 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>> index c7c5ef190afa..1f7ff175f47b 100644
>> --- a/virt/kvm/arm/vgic/vgic.c
>> +++ b/virt/kvm/arm/vgic/vgic.c
>> @@ -720,18 +720,9 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
>>         list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
>>                 spin_lock(&irq->irq_lock);
>>
>> -               if (unlikely(vgic_target_oracle(irq) != vcpu))
>> -                       goto next;
>> -
>> -               /*
>> -                * If we get an SGI with multiple sources, try to get
>> -                * them in all at once.
>> -                */
>> -               do {
>> +               if (likely(vgic_target_oracle(irq) == vcpu))
>>                         vgic_populate_lr(vcpu, irq, count++);
> 
> I think we need to change vgic_populate_lr to set the EOI maintenance
> interrupt flag so that when the interrupt is deactivated, if there are
> additional pending sources, we exit the guest and pick up the
> interrupt.

Potentially. We need to be careful about about the semantics of EOI MI
with non-level interrupts (see the other thread about EOI signalling).

> An alternative would be to set the underflow interrupt, but I don't
> think that would be correct for multiple priorities, because the SGI
> could have a higher priority than other pending interrupts we put in
> the LR.

I don't think priorities are the issue (after all, we already sort the
irqs in order of priority). My worry is that underflow is allowed to
fire if there is one interrupt pending, which implies that you could
end-up in a livelock scenario if you only have one SGI pending with
multiple sources.

Another possibility would be to use ICH_HCR_EL2.NPIE (GICH_HCR.NPIE on
GICv2), which delivers a a MI if no pending interrupts are present. Once
the SGI has been activated, we're guaranteed to be able to inject a new
pending one.

I like the latter, because it doesn't overload the rest of the code with
new semantics. Thoughts?

	M.
-- 
Jazz is not dead. It just smells funny...