[PATCH 2/5] KVM: arm64: Work around GICv3 locally generated SErrors

Alexandru Elisei alexandru.elisei at arm.com
Mon Oct 4 04:23:41 PDT 2021


Hi Marc,

On 9/24/21 09:25, Marc Zyngier wrote:
> The infamous M1 has a feature nobody else ever implemented,
> in the form of the "GIC locally generated SError interrupts",
> also known as SEIS for short.
>
> These SErrors are generated when a guest does something that violates
> the GIC state machine. It would have been simpler to just *ignore*
> the damned thing, but that's not what this HW does. Oh well.
>
> This part of of the architecture is also amazingly under-specified.
> There is a whole 10 lines that describe the feature in a spec that
> is 930 pages long, and some of these lines are factually wrong.
> Oh, and it is deprecated, so the insentive to clarify it is low.
>
> Now, the spec says that this should be a *virtual* SError when
> HCR_EL2.AMO is set. As it turns out, that's not always the case
> on this CPU, and the SError sometimes fires on the host as a
> physical SError. Goodbye, cruel world. This clearly is a HW bug,
> and it means that a guest can easily take the host down, on demand.
>
> Thankfully, we have seen systems that were just as broken in the
> past, and we have the perfect vaccine for it.
>
> Apple M1, please meet the Cavium ThunderX workaround. All your
> GIC accesses will be trapped, sanitised, and emulated. Only the
> signalling aspect of the HW will be used. It won't be super speedy,
> but it will at least be safe. You're most welcome.
>
> Given that this has only ever been seen on this single implementation,
> that the spec is unclear at best and that we cannot trust it to ever
> be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS
> being set.

I grepped for system error in Arm IHI 0069F, and turns out there's a number of
ways to make the GIC generate one:

- When programming the ITS

- On a write to ICC_DIR_EL1 (or the corresponding virtual CPU interface register)
with split priority drop/interrupt deactivation is not enabled.

- On a write to GICV_AEOIR or GICC_DIR.

ITS and the legacy GICv2 interface is memory mapped, so I am going to trust that
KVM emulates that correctly and avoids putting the GIC into a state that triggers
the SErrors.

The CPU interface registers are accessed directly by the guest, then changing that
to trap-and-emulate looks like the only way to avoid the guest from crashing the
host with an SError.

As for making the trap-and-emulate depend on the ICH_VTR_EL2.SEIS being set, that
sounds reasonable to me, considering that there were no reports so far of this
being implemented. And if it turns out that there are device which implement GIC
generated SErrors *correctly* and the trap-and-emulate cost is too much, then we
can always get an errata number from Apple and have the trapping depend on that,
right?

Reviewed-by: Alexandru Elisei <alexandru.elisei at arm.com>

Thanks,

Alex

>
> Signed-off-by: Marc Zyngier <maz at kernel.org>
> ---
>  arch/arm64/kvm/vgic/vgic-v3.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
> index 21a6207fb2ee..ae59e2580bf5 100644
> --- a/arch/arm64/kvm/vgic/vgic-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-v3.c
> @@ -671,6 +671,14 @@ int vgic_v3_probe(const struct gic_kvm_info *info)
>  		group1_trap = true;
>  	}
>  
> +	if (kvm_vgic_global_state.ich_vtr_el2 & ICH_VTR_SEIS_MASK) {
> +		kvm_info("GICv3 with locally generated SEI\n");
> +
> +		group0_trap = true;
> +		group1_trap = true;
> +		common_trap = true;
> +	}
> +
>  	if (group0_trap || group1_trap || common_trap) {
>  		kvm_info("GICv3 sysreg trapping enabled ([%s%s%s], reduced performance)\n",
>  			 group0_trap ? "G0" : "",



More information about the linux-arm-kernel mailing list