[PATCH v2] arm64: mte: switch GCR_EL1 on task switch rather than entry/exit

Mon Jul 5 05:52:19 PDT 2021

On Fri, Jul 02, 2021 at 12:45:18PM -0700, Peter Collingbourne wrote:
> Accessing GCR_EL1 and issuing an ISB can be expensive on some
> microarchitectures. To avoid taking this performance hit on every
> kernel entry/exit, switch GCR_EL1 on task switch rather than
> entry/exit. This is essentially a revert of commit bad1e1c663e0
> ("arm64: mte: switch GCR_EL1 in kernel entry and exit").

As per the discussion in v1, we can avoid an ISB, though we are still
left with the GCR_EL1 access. I'm surprised that access to a non
self-synchronising register is that expensive but I suspect the
benchmark is just timing a dummy syscall. I'm not asking for numbers but
I'd like to make sure we don't optimise for unrealistic use-cases. Is
something like a geekbench score affected for example?

While we can get rid of the IRG in the kernel, at some point we may want
to use ADDG as generated by the compiler. That too is affected by the
GCR_EL1.Exclude mask.

> This requires changing how we generate random tags for HW tag-based
> KASAN, since at this point IRG would use the user's exclusion mask,
> which may not be suitable for kernel use. In this patch I chose to take
> the modulus of CNTVCT_EL0, however alternative approaches are possible.

So a few successive mte_get_mem_tag() will give the same result if the
counter hasn't changed. Even if ARMv8.6 requires a 1GHz timer frequency,
I think an implementation is allowed to count in bigger increments.

I'm inclined to NAK this patch on the grounds that we may need a
specific GCR_EL1 configuration for the kernel. Feedback to the
microarchitects: make access to this register faster.

-- 
Catalin