[bug report] GICv4.1: vSGI remains pending across the guest reset

Mon Dec 18 23:24:55 PST 2023

Hi Marc and Oliver,

On 2023/12/17 19:26, Marc Zyngier wrote:
> On Thu, 14 Dec 2023 12:13:57 +0000,
> Kunkun Jiang <jiangkunkun at huawei.com> wrote:
>> Hi list,
>>
>> We have observed on GICv4.1 systems that, after a guest reset, the
>> secondary VCPU would receive an IPI_CPU_STOP accidently and failed to
>> come online eventually.
>>
>> | Guest User-space
> Guest userspace????
Sorry. The formatting is a bit messy.
|            Guest                               User-space
|
|  reset (with a vSGI pending in the
|  hardware) [0]
|
|                                     disable the distributor (write 0
|                                     into GICD_CTLR) [1]
|
|                                     clear pending status (write 0 into
|                                     GICR_ISPENDR0 for each RD) [2]
|
|  disable the distributor (write 0
|  into GICD_CTLR) [3]
|
|  enable the distributor with ARE,
|  Group1 and nASSGIreq [4]
>> |
>> | reset (with a vSGI pending in the
>> | hardware) [0]
>> |
>> | disable the distributor (write 0
>> | into GICD_CTLR) [1]
>> |
>> | clear pending status (write 0 into
>> | GICR_ISPENDR0 for each RD) [2]
> This cannot clear the pending bits. Writing 0 to any of the
> ISPEND/ICPEND registers is effectively a NOP.
>
> If you want to remove the pending SGIs, you need to write a bunch of
> 1s to ICPENDR0.
>
>> |
>> | disable the distributor (write 0
>> | into GICD_CTLR) [3]
>> |
>> | enable the distributor with ARE,
>> | Group1 and nASSGIreq [4]
>>
>> The problem is that even if user-space tries to disable the distributor
> I don't understand what userspace does here. Why is it significant
> that it is an EL0 access? I don't know of any SW doing that, but even
> if it existed, this should make no difference.
>
>> and clear pending bits for all SGIs, we don't actually propogate it into
>> HW (we only record it via vgic_dist->{enabled, nassgireq} and
>> vgic_irq->pending_latch) and the vSGI remains pending across the guest
>> reset.
>>
>> And when we're at [4], all vSGI's vgic_irq->hw are *true* and
>> vgic_v4_enable_vsgis() becomes a nop.. That's not good.
>>
>> The following solution can solve the problem. Not sure if this is a good
>> solution.Sent out early for suggestions or solutions.
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> index 89117ba2528a..3678ab33f5b9 100644
>> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> @@ -374,6 +374,10 @@ static int vgic_v3_uaccess_write_pending(struct
>> kvm_vcpu *vcpu,
>>               irq->pending_latch = true;
>>               vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>>           } else {
>> +             if (irq->hw && vgic_irq_is_sgi(irq->intid))
>> +                 irq_set_irqchip_state(irq->host_irq,
>> +                              IRQCHIP_STATE_PENDING,
>> +                              false);
>>               irq->pending_latch = false;
>>               raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
>>           }
>>
> But this has *nothing* to do with the guest. This is the *host*
> userspace performing a write to the redistributor view, which has
> different semantics. Which is why your earlier description made no
> sense to me.
>
> I think the problem is slightly larger than what you describe. A write
> to ISPENDR0 should be propagated to the ITS for any values of the
> latch, just like this happens on enabling HW-backed SGIs.
>
> Can you please give this a go?
I have given it a go. I tested reseting the guest 100 times in succession,
and the problem did not occur again.

Thanks,
Kunkun Jiang
> Thanks,
>
> 	M.
>
> >From 9932d74392d969057e84bc3c18bd15f1769b6211 Mon Sep 17 00:00:00 2001
> From: Marc Zyngier <maz at kernel.org>
> Date: Sun, 17 Dec 2023 11:15:09 +0000
> Subject: [PATCH] KVM: arm64: vgic-v4: Restore pending state on host userspace
>   write
>
> When the VMM writes to ISPENDR0 to set the state pending state of
> an SGI, we fail to convey this to the HW if this SGI is already
> backed by a GICv4.1 vSGI.
>
> This is a bit of a corner case, as this would only occur if the
> vgic state is changed on an already running VM, but this can
> apparently happen across a guest reset driven by the VMM.
>
> Fix this by always writing out the pending_latch value to the
> HW, and reseting it to false.
>
> Reported-by: Kunkun Jiang <jiangkunkun at huawei.com>
> Signed-off-by: Marc Zyngier <maz at kernel.org>
> Link: https://lore.kernel.org/r/7e7f2c0c-448b-10a9-8929-4b8f4f6e2a32@huawei.com
> ---
>   arch/arm64/kvm/vgic/vgic-mmio-v3.c | 27 +++++++++++++++++----------
>   1 file changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> index a764b0ab8bf9..2533f264b954 100644
> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> @@ -365,19 +365,26 @@ static int vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
>   		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>   
>   		raw_spin_lock_irqsave(&irq->irq_lock, flags);
> -		if (test_bit(i, &val)) {
> -			/*
> -			 * pending_latch is set irrespective of irq type
> -			 * (level or edge) to avoid dependency that VM should
> -			 * restore irq config before pending info.
> -			 */
> -			irq->pending_latch = true;
> -			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> -		} else {
> +
> +		/*
> +		 * pending_latch is set irrespective of irq type
> +		 * (level or edge) to avoid dependency that VM should
> +		 * restore irq config before pending info.
> +		 */
> +		irq->pending_latch = test_bit(i, &val);
> +
> +		if (irq->hw && vgic_irq_is_sgi(irq->intid)) {
> +			irq_set_irqchip_state(irq->host_irq,
> +					      IRQCHIP_STATE_PENDING,
> +					      irq->pending_latch);
>   			irq->pending_latch = false;
> -			raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
>   		}
>   
> +		if (irq->pending_latch)
> +			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> +		else
> +			raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
> +
>   		vgic_put_irq(vcpu->kvm, irq);
>   	}
>