[PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects

Tue May 23 05:24:52 PDT 2023

On 2023-05-16 17:29, James Morse wrote:
> Hi Catalin, Marc,
> 
> On 11/05/2023 22:13, Catalin Marinas wrote:
>> On Thu, May 11, 2023 at 07:42:58PM +0100, Marc Zyngier wrote:
>>> On Thu, 11 May 2023 18:15:15 +0100,
>>> Catalin Marinas <catalin.marinas at arm.com> wrote:
>>>> On Thu, Mar 30, 2023 at 05:51:22PM +0100, James Morse wrote:
>>>>> When stage1 translation is disabled, the SCTRL_E1.I bit controls the
>>>>> attributes used for instruction fetch, one of the options results in a
>>>>> non-cacheable access. A whole host of CPUs missed the FWB override
>>>>> in this case, meaning a KVM guest could fetch stale/junk data instead of
>>>>> instructions.
>>>>>
>>>>> The workaround is to disable FWB, and do the required cache maintenance
>>>>> instead.
> 
>>>> I think the workaround can be to only do the required cache maintenance
>>>> without disabling FWB. Having FWB on doesn't bring any performance
>>>> benefits if we do the cache maintenance anyway but keeping it around may
>>>> be useful for other reasons (e.g. KVM device pass-through using
>>>> cacheable mappings, though not something KVM supports currently).
>>>
>>> But you'd also rely on the guest doing its own cache maintenance for
>>> instructions it writes, right?
>>
>> Ah, you are right. It looks like I only considered the host writing
>> instructions. If the guest disabled stage 1 and wrote some instructions
>> with FWB on, they'd not necessarily reach the PoC while the instructions
>> are fetched from PoC with this bug. Even with SCTLR_EL1.I==0, the guest
>> is supposed to do an IC IVAU if it wrote instructions but that's not
>> sufficient (hint to the micro-architects, add a chicken bit to upgrade
>> IC IVAU to also do a DC CVAC ;))
>>
>>> Which probably means exposing a different CLIDR_EL1 so that
>>> LoC/LoUU/LoUIS are consistent with *not* having FWB... I also wonder
>>> if keeping FWB set has the potential to change the semantics of the
>>> CMOs (the spec seems silent on that front).
>>
>> Not sure about CMOs, I'd expect them to behave in the same way. However,
>> I don't see how faking CLIDR_EL1 can trick the guest into doing DC CVAC
>> when its MMU is off.
> 
> I think the request is to keep the FWB feature, but to disable it for all host memory
> the guest can execute from. I presume this 'device pass-through using cacheable mappings'
> would mark that address range as XN at stage2, ( ... it's special right?).
> 
> If this is for something like CXL: it can't set XN, and the guest would still be exposed
> to the erratum if it executes from theses addresses with the MMU off.
> 
> Does this need doing now? It wouldn't need backporting to older kernels...
> 
> 
>>>> That said, maybe we can reduce the risk further by doing the
>>>> vcpu_has_run_once() trick with !FWB and clean the D side to PoC on a
>>>> stage 2 exec fault (together with the I-cache invalidation). We can then
>>>> ignore any other cache maintenance on S2 faults until someone shouts (we
>>>> can maybe recommend forcing FWB off on the command line through the
>>>> cpuid override).
>>>
>>> You lost me here with your vcpu_has_run_once().
>>
>> Most likely I lost myself in the code. So the tricks we used in the past
>> tracking the guest MMU off/on was only for the D side. If (we hope that)
>> the guest only wrote instructions to a page once before executing them
>> (and never writing instructions again), we could trap a subsequent exec
>> fault and do the D-cache clean to PoC again.
>>
>>> Keeping the CMOs on exec fault is definitely manageable. But is that
>>> enough?
>>
>> Yeah, not sure it's enough if the guest keeps writing instructions to
>> the same page with the MMU off.
> 
> The difference between FWB and IDC/DIC still does my head in: My reading is FWB implies
> IDC, (but the CTR_EL0.IDC bit might not be set). This doesn't help if the wrong attributes
> are being used for instruction fetch.
> This is cache-maintenance that wasn't needed before, so there are no tricks with the id
> registers we can pull to make the guest do it.
> 
> 
> v2 of this will flip the polarity, and also detect based on an 'arm,interconnect'
> compatible, or the existing compatible the PMU driver uses.

Unfortunately I don't think PMUs are going to be a meaningful indicator 
in general since they don't imply anything about topology - you may 
infer that, say, an Arm CMN exists *somewhere* in the system, but it 
could conceivably be on some I/O chiplet connected via CCIX/CXL to a 
different interconnect on the CPU die which *does* still need the 
mitigation.

Thanks,
Robin.