[PATCH v3] arm64: mte: optimize GCR_EL1 modification on kernel entry/exit
Peter Collingbourne
pcc at google.com
Tue Jul 13 18:37:21 PDT 2021
On Tue, Jul 13, 2021 at 8:52 AM Will Deacon <will at kernel.org> wrote:
>
> On Thu, Jul 08, 2021 at 06:49:41PM -0700, Peter Collingbourne wrote:
> > Accessing GCR_EL1 and issuing an ISB can be expensive on some
> > microarchitectures. Although we must write to GCR_EL1, we can
> > restructure the code to avoid reading from it because the new value
> > can be derived entirely from the exclusion mask, which is already in
> > a GPR. Do so.
> >
> > Furthermore, although an ISB is required in order to make this system
> > register update effective, and the same is true for PAC-related updates
> > to SCTLR_EL1 or APIAKey{Hi,Lo}_EL1, we issue two ISBs on machines
> > that support both features while we only need to issue one. To avoid
> > the unnecessary additional ISB, remove the ISBs from the PAC and
> > MTE-specific alternative blocks and add an ISB in a separate block
> > that is activated only if either feature is supported.
>
> Sorry to be a pain, but can you split this into two patches, please? I
> think you're making two distinct changes, and it would be easier to review
> and discuss them separately (it would also be interesting to know the
> relative performance improvement you get from them).
Fair enough, done.
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index efed2830d141..740e09ade2ea 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -1717,6 +1717,20 @@ static bool has_generic_auth(const struct arm64_cpu_capabilities *entry,
> > }
> > #endif /* CONFIG_ARM64_PTR_AUTH */
> >
> > +static bool has_address_auth_or_mte(const struct arm64_cpu_capabilities *entry,
> > + int scope)
> > +{
> > +#ifdef CONFIG_ARM64_PTR_AUTH
> > + if (has_address_auth_metacap(entry, scope))
> > + return true;
> > +#endif
> > +#ifdef CONFIG_ARM64_MTE
> > + if (__system_matches_cap(ARM64_MTE))
> > + return true;
> > +#endif
> > + return false;
> > +}
> > +
> > #ifdef CONFIG_ARM64_E0PD
> > static void cpu_enable_e0pd(struct arm64_cpu_capabilities const *cap)
> > {
> > @@ -2218,6 +2232,11 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
> > .matches = has_cpuid_feature,
> > .min_field_value = 1,
> > },
> > + {
> > + .capability = ARM64_HAS_ADDRESS_AUTH_OR_MTE,
> > + .type = ARM64_CPUCAP_BOOT_CPU_FEATURE,
> > + .matches = has_address_auth_or_mte,
> > + },
>
> I'd rather avoid adding a new cap for this, as these features are entirely
> unrelated in the architecture and if we end up piling more combinations of
> features in here in the future then I fear it will become quite unwieldy.
The idea is that we wouldn't change this to be
ARM64_HAS_ADDRESS_AUTH_OR_MTE_OR_SOMENEWFEATURE but would instead
rename it to ARM64_REQUIRES_ISB_ON_ENTRY or something.
> Instead, how about we just use a conditional branch alongside the existing
> capabilities? E.g.
>
>
> alternative_if ARM64_MTE
> isb
> b 1f
> alternative_else_nop_endif
> alternative_if ARM64_HAS_ADDRESS_AUTH
> isb
> alternative_else_nop_endif
> 1:
>
> ?
That will work for now. I couldn't see a difference at 95% CI on my
hardware but I suspect that it will gradually get slower as more
features are added. It's something that we can solve later though,
e.g. using the new cap or the alternative_cb thing.
Peter
More information about the linux-arm-kernel
mailing list