[RFC PATCH 3/3] arm64: errata: Disable FWB on parts with non-ARM interconnects

Tue Feb 21 09:48:30 PST 2023

Hi Marc,

On 16/02/2023 18:46, Marc Zyngier wrote:
> On Thu, 16 Feb 2023 18:22:01 +0000,
> James Morse <james.morse at arm.com> wrote:
>>
>> Force Write Back (FWB) allows the hypervisor to force non-cacheable
>> accesses made by a guest to be cacheable. This saves the hypervisor
>> from doing cache maintenance on all pages the guest can access, to
>> ensure the guest doesn't see stale (and possibly sensitive) data when
>> making a non-cacheable access.
>>
>> When stage1 translation is disabled, the SCTRL_E1.I bit controls the
>> attributes used for instruction fetch, one of the options results in a
>> non-cacheable access. A whole host of CPUs missed the FWB override
>> in this case, meaning a KVM guest could fetch stale/junk data instead of
>> instructions.
>>
>> The workaround is to always do the cache maintenance. These parts don't
>> have fine-grained-traps, so it isn't feasible to detect the guest
>> disabling the MMU. Instead, disable FWB on the host.
>>
>> While the CPUs are affected, this erratum doesn't occur on parts using
>> Arm's CMN interconnects. Use the Errata Management API to discover whether
>> this CPU is affected.
>>
>> Because guest execution is compromised, the workaround is enabled by
>> default. If the Errata Management API isn't implemented by firmware, the
>> workaround will be enabled. If a target platform is not affected, and it
>> isn't possible to add support for the Errata Management API, the erratum
>> can be disabled in Kconfig.

> I'm feeling a bit sick...

> My main concern is hardly anyone implements this errata management
> API, if at all. We should:

If anyone? Today no-one implements it!

We've always had to update one of the firmware or kernel for any errata workaround. I
agree this 'both' option is annoying, but if half the story was missing, you already had a
problem.

> - give people an option to disable this from the command-line if they
>   know they are on an unaffected system

(my least favourite)

> - have some form of DT property that indicates the HW isn't affected

All perfectly valid options. The one part Arm is aware that is affected uses Neoverse-V2,
which is much more likely to appear in ACPI machines. The firmware discovery is preferable
to trying to match the 'OEM id' of some random ACPI to determine if the part is affected -
that whole model falls down if the SoC is OEM'd. (Dell, HP, Lenovo, etc)

I think its fair to say you have to support the firmware discovery API if you use ACPI,
and its optional for DT.

Thanks,

James