[PATCH v2 1/2] arm64: errata: Remove AES hwcap for COMPAT tasks

James Morse james.morse at arm.com
Thu Jul 14 09:05:19 PDT 2022


Hi Will,

On 20/04/2022 11:17, Will Deacon wrote:
> On Thu, Apr 14, 2022 at 06:43:32PM +0100, James Morse wrote:
>> On 14/04/2022 11:03, Will Deacon wrote:
>>> On Wed, Apr 13, 2022 at 06:05:44PM +0100, James Morse wrote:
>>>> Cortex-A57 and Cortex-A72 have an erratum where an interrupt that
>>>> occurs between a pair of AES instructions in aarch32 mode may corrupt
>>>> the ELR. The task will subsequently produce the wrong AES result.
>>>>
>>>> The AES instructions are part of the cryptographic extensions, which are
>>>> optional. User-space software will detect the support for these
>>>> instructions from the hwcaps. If the platform doesn't support these
>>>> instructions a software implementation should be used.
>>>>
>>>> Remove the hwcap bits on affected parts to indicate user-space should
>>>> not use the AES instructions.
>>
>>>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>>>> index d72c4b4d389c..3faf16f1c040 100644
>>>> --- a/arch/arm64/kernel/cpufeature.c
>>>> +++ b/arch/arm64/kernel/cpufeature.c
>>>> @@ -1922,6 +1922,12 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
>>>>  }
>>>>  #endif /* CONFIG_ARM64_MTE */
>>>>  
>>>> +static void elf_hwcap_fixup(void)
>>>> +{
>>>> +	if (cpus_have_const_cap(ARM64_WORKAROUND_1742098))
>>>> +		compat_elf_hwcap2 &= ~COMPAT_HWCAP2_AES;
>>>> +}
>>>
>>> How does this deal with big/little if we late online an affected CPU?  It
>>> would probably be easier if we treated these CPUs as not having the 32-bit
>>> AES instructions at all (rather than removing the hwcap later), then the
>>> early cap check would prevent late onlining.
>>
>> I thought any new CPU to online late with a new errata was rejected by the
>> type == ARM64_CPUCAP_LOCAL_CPU_ERRATUM. Suzuki's documentation in cpufeature.h has:
>> | However, it is not safe if a "late" CPU requires a workaround and the system hasn't
>> | enabled it already.
>>
>> In this case verify_local_cpu_caps() would take the else for 'system_has_cap', and because
>> the cpu matches, but ARM64_CPUCAP_LOCAL_CPU_ERRATUM doesn't have the 'late cpu permitted'
>> bit set, it should call cpu_die_early().
>>
>> That said - I haven't tested this configuration. (I'll give it a go with the model)
> 
> Ah yes, that probably works, but please do test it to confirm.
> 
>> v1 did as you suggest - but the HWCAPs are built from the id registers, and touching the
>> id registers will regress KVM guest migration as the id registers are both visible to
>> Qemu, and invariant.

> Hmm, is that really something we expect to work in general? It seems to me
> that any erratum workaround which effectively removes functionality is going
> to be a blocker for migration.

The workaround is only for the host. The guest may already have the workaround. I think
these things should be kept separate unless the guest would be broken by the workaround.

We don't normally apply workaround for EL1 from EL2 unless its needed by the host. Its up
to the guest to have its own workaround.


Thanks,

James



More information about the linux-arm-kernel mailing list