[PATCH v2] kasan: arm64: support specialized outlined tag mismatch checks

Thu Dec 3 00:14:24 EST 2020

Hi Mark,

On Tue, Nov 24, 2020 at 1:18 PM Mark Rutland <mark.rutland at arm.com> wrote:
>
>
> Hi Peter,
>
> Thanks for bearing with me on the ABI bits. My main concerns now is
> making that clear, and I have some concrete suggestions below.
>
> As this depends on your other patches to the stack trace code, could you
> please make that dependency more explicit (e.g. fold the two into a
> single series)? That way we can avoid accidental breakage.

Sure, done in v3.

> On Fri, Nov 20, 2020 at 03:02:11PM -0800, Peter Collingbourne wrote:
> > By using outlined checks we can achieve a significant code size
> > improvement by moving the tag-based ASAN checks into separate
> > functions. Unlike the existing CONFIG_KASAN_OUTLINE mode these
> > functions have a custom calling convention that preserves most
> > registers and is specialized to the register containing the address
> > and the type of access, and as a result we can eliminate the code
> > size and performance overhead of a standard calling convention such
> > as AAPCS for these functions.
> >
> > This change depends on a separate series of changes to Clang [1] to
> > support outlined checks in the kernel, although the change works fine
> > without them (we just don't get outlined checks). This is because the
> > flag -mllvm -hwasan-inline-all-checks=0 has no effect until the Clang
> > changes land. The flag was introduced in the Clang 9.0 timeframe as
> > part of the support for outlined checks in userspace and because our
> > minimum Clang version is 10.0 we can pass it unconditionally.
> >
> > Outlined checks require a new runtime function with a custom calling
> > convention. Add this function to arch/arm64/lib.
> >
> > I measured the code size of defconfig + tag-based KASAN, as well
> > as boot time (i.e. time to init launch) on a DragonBoard 845c with
> > an Android arm64 GKI kernel. The results are below:
> >
> >                                code size    boot time
> > CONFIG_KASAN_INLINE=y before    92824064      6.18s
> > CONFIG_KASAN_INLINE=y after     38822400      6.65s
> > CONFIG_KASAN_OUTLINE=y          39215616     11.48s
> >
> > We can see straight away that specialized outlined checks beat the
> > existing CONFIG_KASAN_OUTLINE=y on both code size and boot time
> > for tag-based ASAN.
> >
> > As for the comparison between CONFIG_KASAN_INLINE=y before and after
> > we saw similar performance numbers in userspace [2] and decided
> > that since the performance overhead is minimal compared to the
> > overhead of tag-based ASAN itself as well as compared to the code
> > size improvements we would just replace the inlined checks with the
> > specialized outlined checks without the option to select between them,
> > and that is what I have implemented in this patch. But we may make a
> > different decision for the kernel such as having CONFIG_KASAN_OUTLINE=y
> > turn on specialized outlined checks if Clang is new enough.
> >
> > Signed-off-by: Peter Collingbourne <pcc at google.com>
> > Link: https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76
> > Link: [1] https://reviews.llvm.org/D90426
> > Link: [2] https://reviews.llvm.org/D56954
> > ---
> > v2:
> > - use calculations in the stack spills and restores
> > - improve the comment at the top of the function
> > - add a BTI instruction
>
> > +/*
> > + * Report a tag mismatch detected by tag-based KASAN.
> > + *
> > + * This function has a custom calling convention in order to minimize the sizes
> > + * of the compiler-generated thunks that call it. All registers except for x16
> > + * and x17 must be preserved. This includes x0 and x1 which are used by the
> > + * caller to pass arguments. In order to allow these registers to be restored
> > + * the caller spills x0 and x1 to sp+0 and sp+8. The registers x29 and x30 are
> > + * spilled to sp+232 and sp+240, and although it is not strictly necessary for
> > + * the caller to spill them, that is how the ABI for these functions has been
> > + * defined. The 256 bytes of stack space allocated by the caller must be
> > + * deallocated on return.
> > + *
> > + * This function takes care of transitioning to the standard AAPCS calling
> > + * convention and calls the C function kasan_tag_mismatch to report the error.
> > + *
> > + * Parameters:
> > + *   x0 - the fault address
> > + *   x1 - an encoded description of the faulting access
> > + */
>
> To make this more explicit, would you be happy with the below?
>
> /*
>  * Report a tag mismatch detected by tag-based KASAN.
>  *
>  * A compiler-generated thunk calls this with a non-AAPCS calling
>  * convention. Upon entry to this function, registers are as follows:
>  *
>  * x0:         fault address (see below for restore)
>  * x1:         fault description (see below for restore)
>  * x2 to x15:  callee-saved
>  * x16 to x17: safe to clobber
>  * x18 to x30: callee-saved
>  * sp:         pre-decremented by 256 bytes (see below for restore)
>  *
>  * The caller has decremented the SP by 256 bytes, and created a
>  * structure on the stack as follows:
>  *
>  * sp + 0..15:    x0 and x1 to be restored
>  * sp + 16..231:  free for use
>  * sp + 232..247: x29 and x30 (same as in GPRs)
>  * sp + 248..255: free for use
>  *
>  * Note that this is not a struct pt_regs.
>  *
>  * To call a regular AAPCS function we must save x2 to x15 (which we can
>  * store in the gaps), and create a frame record (for which we can use
>  * x29 and x30 spilled by the caller as those match the GPRs).
>  *
>  * The caller expects x0 and x1 to be restored from the structure, and
>  * for the structure to be removed from the stack (i.e. the SP must be
>  * incremented by 256 prior to return).
>  */
>
> > +SYM_CODE_START(__hwasan_tag_mismatch)
> > +#ifdef BTI_C
> > +     BTI_C
> > +#endif
> > +     add     x29, sp, #232
> > +     stp     x2, x3, [sp, #8 * 2]
> > +     stp     x4, x5, [sp, #8 * 4]
> > +     stp     x6, x7, [sp, #8 * 6]
> > +     stp     x8, x9, [sp, #8 * 8]
> > +     stp     x10, x11, [sp, #8 * 10]
> > +     stp     x12, x13, [sp, #8 * 12]
> > +     stp     x14, x15, [sp, #8 * 14]
> > +#ifndef CONFIG_SHADOW_CALL_STACK
> > +     str     x18, [sp, #8 * 18]
> > +#endif
>
> Can we please add a linespace here...
>
> > +     mov     x2, x30
> > +     bl      kasan_tag_mismatch
>
> ... and one here? That'll clearly separate the save/call/restore
> sequences.
>
> > +     ldp     x29, x30, [sp, #8 * 29]
> > +#ifndef CONFIG_SHADOW_CALL_STACK
> > +     ldr     x18, [sp, #8 * 18]
> > +#endif
> > +     ldp     x14, x15, [sp, #8 * 14]
> > +     ldp     x12, x13, [sp, #8 * 12]
> > +     ldp     x10, x11, [sp, #8 * 10]
> > +     ldp     x8, x9, [sp, #8 * 8]
> > +     ldp     x6, x7, [sp, #8 * 6]
> > +     ldp     x4, x5, [sp, #8 * 4]
> > +     ldp     x2, x3, [sp, #8 * 2]
> > +     ldp     x0, x1, [sp], #256
>
> To match what we do elsewhere, please put the restore into ascending
> order, restoring x29 and x30 last. That'll match our other trampolines,
> is more forgiving for CPUs that only prefetch forwards, and it makes it
> easier to compare the save and restore sequences line-by-line.
>
> Then we can have a separate:
>
>         /* remove the structure from the stack */
>         add     sp, sp, #256
>
> ... which is easier to match up with the calling convention description.
>
> Thanks,
> Mark.

Thanks for these suggestions. They all look good to me so I've adopted
them as is in v3.

Peter