[PATCH v2] kasan: arm64: support specialized outlined tag mismatch checks
Peter Collingbourne
pcc at google.com
Thu Dec 3 00:14:24 EST 2020
Hi Mark,
On Tue, Nov 24, 2020 at 1:18 PM Mark Rutland <mark.rutland at arm.com> wrote:
>
>
> Hi Peter,
>
> Thanks for bearing with me on the ABI bits. My main concerns now is
> making that clear, and I have some concrete suggestions below.
>
> As this depends on your other patches to the stack trace code, could you
> please make that dependency more explicit (e.g. fold the two into a
> single series)? That way we can avoid accidental breakage.
Sure, done in v3.
> On Fri, Nov 20, 2020 at 03:02:11PM -0800, Peter Collingbourne wrote:
> > By using outlined checks we can achieve a significant code size
> > improvement by moving the tag-based ASAN checks into separate
> > functions. Unlike the existing CONFIG_KASAN_OUTLINE mode these
> > functions have a custom calling convention that preserves most
> > registers and is specialized to the register containing the address
> > and the type of access, and as a result we can eliminate the code
> > size and performance overhead of a standard calling convention such
> > as AAPCS for these functions.
> >
> > This change depends on a separate series of changes to Clang [1] to
> > support outlined checks in the kernel, although the change works fine
> > without them (we just don't get outlined checks). This is because the
> > flag -mllvm -hwasan-inline-all-checks=0 has no effect until the Clang
> > changes land. The flag was introduced in the Clang 9.0 timeframe as
> > part of the support for outlined checks in userspace and because our
> > minimum Clang version is 10.0 we can pass it unconditionally.
> >
> > Outlined checks require a new runtime function with a custom calling
> > convention. Add this function to arch/arm64/lib.
> >
> > I measured the code size of defconfig + tag-based KASAN, as well
> > as boot time (i.e. time to init launch) on a DragonBoard 845c with
> > an Android arm64 GKI kernel. The results are below:
> >
> > code size boot time
> > CONFIG_KASAN_INLINE=y before 92824064 6.18s
> > CONFIG_KASAN_INLINE=y after 38822400 6.65s
> > CONFIG_KASAN_OUTLINE=y 39215616 11.48s
> >
> > We can see straight away that specialized outlined checks beat the
> > existing CONFIG_KASAN_OUTLINE=y on both code size and boot time
> > for tag-based ASAN.
> >
> > As for the comparison between CONFIG_KASAN_INLINE=y before and after
> > we saw similar performance numbers in userspace [2] and decided
> > that since the performance overhead is minimal compared to the
> > overhead of tag-based ASAN itself as well as compared to the code
> > size improvements we would just replace the inlined checks with the
> > specialized outlined checks without the option to select between them,
> > and that is what I have implemented in this patch. But we may make a
> > different decision for the kernel such as having CONFIG_KASAN_OUTLINE=y
> > turn on specialized outlined checks if Clang is new enough.
> >
> > Signed-off-by: Peter Collingbourne <pcc at google.com>
> > Link: https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76
> > Link: [1] https://reviews.llvm.org/D90426
> > Link: [2] https://reviews.llvm.org/D56954
> > ---
> > v2:
> > - use calculations in the stack spills and restores
> > - improve the comment at the top of the function
> > - add a BTI instruction
>
> > +/*
> > + * Report a tag mismatch detected by tag-based KASAN.
> > + *
> > + * This function has a custom calling convention in order to minimize the sizes
> > + * of the compiler-generated thunks that call it. All registers except for x16
> > + * and x17 must be preserved. This includes x0 and x1 which are used by the
> > + * caller to pass arguments. In order to allow these registers to be restored
> > + * the caller spills x0 and x1 to sp+0 and sp+8. The registers x29 and x30 are
> > + * spilled to sp+232 and sp+240, and although it is not strictly necessary for
> > + * the caller to spill them, that is how the ABI for these functions has been
> > + * defined. The 256 bytes of stack space allocated by the caller must be
> > + * deallocated on return.
> > + *
> > + * This function takes care of transitioning to the standard AAPCS calling
> > + * convention and calls the C function kasan_tag_mismatch to report the error.
> > + *
> > + * Parameters:
> > + * x0 - the fault address
> > + * x1 - an encoded description of the faulting access
> > + */
>
> To make this more explicit, would you be happy with the below?
>
> /*
> * Report a tag mismatch detected by tag-based KASAN.
> *
> * A compiler-generated thunk calls this with a non-AAPCS calling
> * convention. Upon entry to this function, registers are as follows:
> *
> * x0: fault address (see below for restore)
> * x1: fault description (see below for restore)
> * x2 to x15: callee-saved
> * x16 to x17: safe to clobber
> * x18 to x30: callee-saved
> * sp: pre-decremented by 256 bytes (see below for restore)
> *
> * The caller has decremented the SP by 256 bytes, and created a
> * structure on the stack as follows:
> *
> * sp + 0..15: x0 and x1 to be restored
> * sp + 16..231: free for use
> * sp + 232..247: x29 and x30 (same as in GPRs)
> * sp + 248..255: free for use
> *
> * Note that this is not a struct pt_regs.
> *
> * To call a regular AAPCS function we must save x2 to x15 (which we can
> * store in the gaps), and create a frame record (for which we can use
> * x29 and x30 spilled by the caller as those match the GPRs).
> *
> * The caller expects x0 and x1 to be restored from the structure, and
> * for the structure to be removed from the stack (i.e. the SP must be
> * incremented by 256 prior to return).
> */
>
> > +SYM_CODE_START(__hwasan_tag_mismatch)
> > +#ifdef BTI_C
> > + BTI_C
> > +#endif
> > + add x29, sp, #232
> > + stp x2, x3, [sp, #8 * 2]
> > + stp x4, x5, [sp, #8 * 4]
> > + stp x6, x7, [sp, #8 * 6]
> > + stp x8, x9, [sp, #8 * 8]
> > + stp x10, x11, [sp, #8 * 10]
> > + stp x12, x13, [sp, #8 * 12]
> > + stp x14, x15, [sp, #8 * 14]
> > +#ifndef CONFIG_SHADOW_CALL_STACK
> > + str x18, [sp, #8 * 18]
> > +#endif
>
> Can we please add a linespace here...
>
> > + mov x2, x30
> > + bl kasan_tag_mismatch
>
> ... and one here? That'll clearly separate the save/call/restore
> sequences.
>
> > + ldp x29, x30, [sp, #8 * 29]
> > +#ifndef CONFIG_SHADOW_CALL_STACK
> > + ldr x18, [sp, #8 * 18]
> > +#endif
> > + ldp x14, x15, [sp, #8 * 14]
> > + ldp x12, x13, [sp, #8 * 12]
> > + ldp x10, x11, [sp, #8 * 10]
> > + ldp x8, x9, [sp, #8 * 8]
> > + ldp x6, x7, [sp, #8 * 6]
> > + ldp x4, x5, [sp, #8 * 4]
> > + ldp x2, x3, [sp, #8 * 2]
> > + ldp x0, x1, [sp], #256
>
> To match what we do elsewhere, please put the restore into ascending
> order, restoring x29 and x30 last. That'll match our other trampolines,
> is more forgiving for CPUs that only prefetch forwards, and it makes it
> easier to compare the save and restore sequences line-by-line.
>
> Then we can have a separate:
>
> /* remove the structure from the stack */
> add sp, sp, #256
>
> ... which is easier to match up with the calling convention description.
>
> Thanks,
> Mark.
Thanks for these suggestions. They all look good to me so I've adopted
them as is in v3.
Peter
More information about the linux-arm-kernel
mailing list