[PATCH] kasan: arm64: support specialized outlined tag mismatch checks

Ard Biesheuvel ardb at kernel.org
Fri Nov 20 12:45:51 EST 2020


On Fri, 13 Nov 2020 at 12:23, Mark Rutland <mark.rutland at arm.com> wrote:
>
> Hi Peter,
>
> Thanks for this; the additional detail was very helpful.
>
> The high-level idea looks fine to me, but I have some qualms with the
> ABI, and I think we should rejig the compiler side rather than taking
> this as-is. It also looks like there may be an issue with PLTs. I've
> tried to explain those in detail below.
>
> On Thu, Nov 05, 2020 at 05:12:56PM -0800, Peter Collingbourne wrote:
> > On Tue, Nov 3, 2020 at 7:28 AM Mark Rutland <mark.rutland at arm.com> wrote:
> > >
> > > Hi Peter,
> > >
> > > On Thu, Oct 29, 2020 at 01:59:44PM -0700, Peter Collingbourne wrote:
> > > > Outlined checks require a new runtime function with a custom calling
> > > > convention. Add this function to arch/arm64/lib.
> > >
> > > [...]
> > >
> > > > +/*
> > > > + * Report a tag mismatch detected by tag-based KASAN.
> > > > + *
> > > > + * This function has a custom calling convention in order to minimize the sizes
> > > > + * of the compiler-generated thunks that call it. All registers except for x16
> > > > + * and x17 must be preserved. This includes x0 and x1 which are used by the
> > > > + * caller to pass arguments, and x29 (the frame pointer register). In order to
> > > > + * allow these registers to be restored the caller spills them to stack. The 256
> > > > + * bytes of stack space allocated by the caller must be deallocated on return.
> > >
> > > This is a bit difficult to follow.
> > >
> > > Am I right in understanding that the caller has pre-allocated 256 bytes
> > > on the stack by decrementing the SP, yet it's up to the callee to undo
> > > that? That seems bizarre to me, and it would be much easier (and more
> > > generally useful) for the caller to leave it to the trampoline to
> > > allocate/free its own space.
> >
> > Yes, that's how it works. The goal was to reduce code size costs,
> > since otherwise you would need setup and teardown code in each
> > trampoline together with unwind info which will not necessarily be
> > deduplicated by the linker. It's probably easiest to think of the
> > trampoline as the "head" of the tag mismatch handling function, while
> > the runtime part is the "tail". From that perspective everything is
> > balanced.
>
> I appreciate the general idea, but there some details that I think
> aren't quite right in the ABI, and O'd ratehr we rejig that before
> settling on it, especially as IIUC this was added specifically to speed
> up kernel usage, and I'd like to make sure it works well in-context.
>
> Firstly, reserving 256 bytes in the compiler-generated trampoline is
> arbitrary, and I'd rather that trampoline only reserved the space it
> needs (currently 32 bytes for a stack record and x0/x1), even if that
> means the runtime trampoline has to shuffle x0/x1.
>
> Secondly, I don't beieve that the compiler-generated trampoline needs to
> stack x29/x30. It makes a tail-call to __hwasan_tag_mismatch, which
> generally suggests it *shouldn't* create a frame record. It doesn't
> clobber the original x29/x30, so a frame record isn't necessary (even
> for reliable stacktrace), and it doesn't plumb the new record into x29,
> so it could leave all that work to the runtime trampoline and save an
> instruction.
>
> I think that the stacking in the compiler generated trampoline should be
> limited to:
>
>         stp x0, x1, [sp, #-16]!
>
> ... which is much like how mfentry works on most architectures, and
> similar to how patchable-function-entry works on arm64.
>
> > For reference here is an example of a trampoline function for the kernel:
> >
> > __hwasan_check_x1_67043328:
> > sbfx x16, x1, #4, #52
> > ldrb w16, [x9, x16]
> > cmp x16, x1, lsr #56
> > b.ne .Ltmp5
> > .Ltmp6:
> > ret
> > .Ltmp5:
> > lsr x16, x1, #56
> > cmp x16, #255
> > b.eq .Ltmp6
> > stp x0, x1, [sp, #-256]!
> > stp x29, x30, [sp, #232]
> > mov x0, x1
> > mov x1, #0
> > b __hwasan_tag_mismatch
>
> Thanks for this; it was helpful in clarifying my understanding.
>
> I note this clobbers x16 unconditionally. Is x17 free for use too?
>
> > > > + *
> > > > + * This function takes care of transitioning to the standard AAPCS calling
> > > > + * convention and calls the C function kasan_tag_mismatch to report the error.
> > > > + *
> > > > + * Parameters:
> > > > + *   x0 - the fault address
> > > > + *   x1 - an encoded description of the faulting access
> > > > + */
> > > > +SYM_FUNC_START(__hwasan_tag_mismatch)
> > >
> > > For non-AAPCS code you should use SYM_CODE_*() rather than SYM_FUNC_*(),
> > > and place any BTI instructions necessary manually.
> >
> > Okay. Since the trampolines may appear in modules I guess it's
> > possible for the function to be called via a module PLT so a BTI
> > instruction is required.
>
> I think so, but I am not completely certain (there are special rules for
> branches from PLTs using x16/x17).
>

Yes, you will need a BTI C here so that indirect calls via X16 are permitted.

> > If I am reading the code in arch/arm64/kernel/module-plts.c correctly
> > it looks like the module PLTs may only clobber x16, correct? That
> > works for me.
>
> PLTs can clobber both x16 and x17. Do callers of the compiler-generated
> trampolines account for those being clobbered? I see from the example
> above that the trampolines do not.
>

Note that this may happen when the kernel is very big - when you build
allyesconfig, the linker may emit veneers for direct branches that are
out of -/+ 128 MB range, and I don't think we can assume anything
about whether those may use x16, x17 or both.

> > > > +#ifndef CONFIG_SHADOW_CALL_STACK
> > > > +     str     x18, [sp, #128]
> > > > +#endif
> > >
> > > This doesn't look right to me. The shadow call stack should remain
> > > balanced across a call, so why do you need to save x18? any manipulation
> > > within kasan_tag_mismatch should be balanced.
> > >
> > > Did you mean to save the return address to the shadow stack, rather
> > > than saving the shadow stack pointer?
> >
> > This is to handle the case where SCS is disabled. In this case, x18
> > becomes a temporary register for AAPCS functions, but this calling
> > convention requires it to be saved.
>
> My bad -- I had misread the `ifndef` as an `ifdef`, which was why I was
> confused as to why you'd save x18 when SCS was enabled, which is not
> what's going on.
>
> Thanks,
> Mark.
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



More information about the linux-arm-kernel mailing list