[PATCH 2/5] KVM: arm64: Work around GICv3 locally generated SErrors

Marc Zyngier maz at kernel.org
Mon Oct 4 06:25:12 PDT 2021


Hi Alex,

On Mon, 04 Oct 2021 12:23:41 +0100,
Alexandru Elisei <alexandru.elisei at arm.com> wrote:
> 
> Hi Marc,
> 
> On 9/24/21 09:25, Marc Zyngier wrote:
> > The infamous M1 has a feature nobody else ever implemented,
> > in the form of the "GIC locally generated SError interrupts",
> > also known as SEIS for short.
> >
> > These SErrors are generated when a guest does something that violates
> > the GIC state machine. It would have been simpler to just *ignore*
> > the damned thing, but that's not what this HW does. Oh well.
> >
> > This part of of the architecture is also amazingly under-specified.
> > There is a whole 10 lines that describe the feature in a spec that
> > is 930 pages long, and some of these lines are factually wrong.
> > Oh, and it is deprecated, so the insentive to clarify it is low.
> >
> > Now, the spec says that this should be a *virtual* SError when
> > HCR_EL2.AMO is set. As it turns out, that's not always the case
> > on this CPU, and the SError sometimes fires on the host as a
> > physical SError. Goodbye, cruel world. This clearly is a HW bug,
> > and it means that a guest can easily take the host down, on demand.
> >
> > Thankfully, we have seen systems that were just as broken in the
> > past, and we have the perfect vaccine for it.
> >
> > Apple M1, please meet the Cavium ThunderX workaround. All your
> > GIC accesses will be trapped, sanitised, and emulated. Only the
> > signalling aspect of the HW will be used. It won't be super speedy,
> > but it will at least be safe. You're most welcome.
> >
> > Given that this has only ever been seen on this single implementation,
> > that the spec is unclear at best and that we cannot trust it to ever
> > be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS
> > being set.
> 
> I grepped for system error in Arm IHI 0069F, and turns out there's a number of
> ways to make the GIC generate one:
> 
> - When programming the ITS
> 
> - On a write to ICC_DIR_EL1 (or the corresponding virtual CPU interface register)
> with split priority drop/interrupt deactivation is not enabled.
> 
> - On a write to GICV_AEOIR or GICC_DIR.
> 
> ITS and the legacy GICv2 interface is memory mapped, so I am going
> to trust that KVM emulates that correctly and avoids putting the GIC
> into a state that triggers the SErrors.

And to be clear, if the host kernel was doing the wrong thing, it
would take a *physical* SError. And on the M1, it really doesn't
matter as there is no physical GIC.

> The CPU interface registers are accessed directly by the guest, then
> changing that to trap-and-emulate looks like the only way to avoid
> the guest from crashing the host with an SError.
> 
> As for making the trap-and-emulate depend on the ICH_VTR_EL2.SEIS
> being set, that sounds reasonable to me, considering that there were
> no reports so far of this being implemented. And if it turns out
> that there are device which implement GIC generated SErrors
> *correctly* and the trap-and-emulate cost is too much, then we can
> always get an errata number from Apple and have the trapping depend
> on that, right?

I have very little hope that we can get Apple to give us anything
here. The CPU doesn't even advertise that it has a vGIC, so we're in
uncharted territories. But we could definitely key that on the MIDR.

> Reviewed-by: Alexandru Elisei <alexandru.elisei at arm.com>

Thanks!

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list