[PATCH 1/6] dt-bindings: interrupt-controller: arm,gic-v3: Add quirk for Mediatek SoCs w/ broken FW

Marc Zyngier maz at kernel.org
Fri May 12 01:02:35 PDT 2023


On Thu, 11 May 2023 23:05:35 +0100,
Douglas Anderson <dianders at chromium.org> wrote:
> 
> When trying to turn on the "pseudo NMI" kernel feature in Linux, it
> was discovered that all Mediatek-based Chromebooks that ever shipped
> (at least ones with GICv3) had a firmware bug where they wouldn't save
> certain GIC "GICR" registers properly. If a processor ever entered a
> suspend/idle mode where the GICR registers lost state then they'd be
> reset to their default state.
> 
> As a result of the bug, if you try to enable "pseudo NMIs" on the
> affected devices then certain interrupts will unexpectedly get
> promoted to be "pseudo NMIs" and cause crashes / freezes / general
> mayhem.
> 
> ChromeOS is looking to start turning on "pseudo NMIs" in production to
> make crash reports more actionable. To do so, we will release firmware
> updates for at least some of the affected Mediatek Chromebooks.
> However, even when we update the firmware of a Chromebook it's always
> possible that a user will end up booting with old firmware. We need to
> be able to detect when we're running with firmware that will crash and
> burn if pseudo NMIs are enabled.
> 
> The current plan is:
> * Update the device trees of all affected Chromebooks to include the
>   'mediatek,gicr-save-quirk' property. The kernel can use this to know
>   not to enable certain features like "pseudo NMI". NOTE: device trees
>   for Chromebooks are never baked into the firmware but are bundled
>   with the kernel. A kernel will never be configured to use "pseudo
>   NMIs" and be bundled with an old device tree.
> * When we get a fixed firmware for one of these Chromebooks, it will
>   patch the device tree to remove this property.

Since you're in control of distributing the FW together with the
kernel, I assume you're also in control of the command line. Why can't
that firmware pass the option enabling the pseudo-NMI support,
dispensing ourselves from all of this?

> 
> For some details, you can also see the public bug
> <https://issuetracker.google.com/281831288>
> 
> Signed-off-by: Douglas Anderson <dianders at chromium.org>
> ---
> 
>  .../bindings/interrupt-controller/arm,gic-v3.yaml           | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml b/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml
> index 92117261e1e1..8c251caae537 100644
> --- a/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml
> +++ b/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml
> @@ -166,6 +166,12 @@ properties:
>    resets:
>      maxItems: 1
>  
> +  mediatek,gicr-save-quirk:

I think this deserves something *much* stronger that outlines what is
wrong, because this is not just a quirk. This is a failure to even
remotely grasp the requirements of the architecture (and to use
standard, public code that would have done it correctly). Something
like "mediatek,broken-save-restore-fw" would be more adequate.

> +    type: boolean
> +    description:
> +      Asserts that the firmware on this device has issues saving and restoring
> +      GICR registers when CPUs are powered off.

Nit: not the the CPUs, but the GIC redistributors.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list