[PATCH] kasan: arm64: support specialized outlined tag mismatch checks

Mark Rutland mark.rutland at arm.com
Fri Nov 13 06:22:23 EST 2020


Hi Peter,

Thanks for this; the additional detail was very helpful.

The high-level idea looks fine to me, but I have some qualms with the
ABI, and I think we should rejig the compiler side rather than taking
this as-is. It also looks like there may be an issue with PLTs. I've
tried to explain those in detail below.

On Thu, Nov 05, 2020 at 05:12:56PM -0800, Peter Collingbourne wrote:
> On Tue, Nov 3, 2020 at 7:28 AM Mark Rutland <mark.rutland at arm.com> wrote:
> >
> > Hi Peter,
> >
> > On Thu, Oct 29, 2020 at 01:59:44PM -0700, Peter Collingbourne wrote:
> > > Outlined checks require a new runtime function with a custom calling
> > > convention. Add this function to arch/arm64/lib.
> >
> > [...]
> >
> > > +/*
> > > + * Report a tag mismatch detected by tag-based KASAN.
> > > + *
> > > + * This function has a custom calling convention in order to minimize the sizes
> > > + * of the compiler-generated thunks that call it. All registers except for x16
> > > + * and x17 must be preserved. This includes x0 and x1 which are used by the
> > > + * caller to pass arguments, and x29 (the frame pointer register). In order to
> > > + * allow these registers to be restored the caller spills them to stack. The 256
> > > + * bytes of stack space allocated by the caller must be deallocated on return.
> >
> > This is a bit difficult to follow.
> >
> > Am I right in understanding that the caller has pre-allocated 256 bytes
> > on the stack by decrementing the SP, yet it's up to the callee to undo
> > that? That seems bizarre to me, and it would be much easier (and more
> > generally useful) for the caller to leave it to the trampoline to
> > allocate/free its own space.
> 
> Yes, that's how it works. The goal was to reduce code size costs,
> since otherwise you would need setup and teardown code in each
> trampoline together with unwind info which will not necessarily be
> deduplicated by the linker. It's probably easiest to think of the
> trampoline as the "head" of the tag mismatch handling function, while
> the runtime part is the "tail". From that perspective everything is
> balanced.

I appreciate the general idea, but there some details that I think
aren't quite right in the ABI, and O'd ratehr we rejig that before
settling on it, especially as IIUC this was added specifically to speed
up kernel usage, and I'd like to make sure it works well in-context.

Firstly, reserving 256 bytes in the compiler-generated trampoline is
arbitrary, and I'd rather that trampoline only reserved the space it
needs (currently 32 bytes for a stack record and x0/x1), even if that
means the runtime trampoline has to shuffle x0/x1.

Secondly, I don't beieve that the compiler-generated trampoline needs to
stack x29/x30. It makes a tail-call to __hwasan_tag_mismatch, which
generally suggests it *shouldn't* create a frame record. It doesn't
clobber the original x29/x30, so a frame record isn't necessary (even
for reliable stacktrace), and it doesn't plumb the new record into x29,
so it could leave all that work to the runtime trampoline and save an
instruction.

I think that the stacking in the compiler generated trampoline should be
limited to:

	stp x0, x1, [sp, #-16]!

... which is much like how mfentry works on most architectures, and
similar to how patchable-function-entry works on arm64.

> For reference here is an example of a trampoline function for the kernel:
> 
> __hwasan_check_x1_67043328:
> sbfx x16, x1, #4, #52
> ldrb w16, [x9, x16]
> cmp x16, x1, lsr #56
> b.ne .Ltmp5
> .Ltmp6:
> ret
> .Ltmp5:
> lsr x16, x1, #56
> cmp x16, #255
> b.eq .Ltmp6
> stp x0, x1, [sp, #-256]!
> stp x29, x30, [sp, #232]
> mov x0, x1
> mov x1, #0
> b __hwasan_tag_mismatch

Thanks for this; it was helpful in clarifying my understanding.

I note this clobbers x16 unconditionally. Is x17 free for use too?

> > > + *
> > > + * This function takes care of transitioning to the standard AAPCS calling
> > > + * convention and calls the C function kasan_tag_mismatch to report the error.
> > > + *
> > > + * Parameters:
> > > + *   x0 - the fault address
> > > + *   x1 - an encoded description of the faulting access
> > > + */
> > > +SYM_FUNC_START(__hwasan_tag_mismatch)
> >
> > For non-AAPCS code you should use SYM_CODE_*() rather than SYM_FUNC_*(),
> > and place any BTI instructions necessary manually.
> 
> Okay. Since the trampolines may appear in modules I guess it's
> possible for the function to be called via a module PLT so a BTI
> instruction is required.

I think so, but I am not completely certain (there are special rules for
branches from PLTs using x16/x17).

> If I am reading the code in arch/arm64/kernel/module-plts.c correctly
> it looks like the module PLTs may only clobber x16, correct? That
> works for me.

PLTs can clobber both x16 and x17. Do callers of the compiler-generated
trampolines account for those being clobbered? I see from the example
above that the trampolines do not.

[...]

> > > +#ifndef CONFIG_SHADOW_CALL_STACK
> > > +     str     x18, [sp, #128]
> > > +#endif
> >
> > This doesn't look right to me. The shadow call stack should remain
> > balanced across a call, so why do you need to save x18? any manipulation
> > within kasan_tag_mismatch should be balanced.
> >
> > Did you mean to save the return address to the shadow stack, rather
> > than saving the shadow stack pointer?
> 
> This is to handle the case where SCS is disabled. In this case, x18
> becomes a temporary register for AAPCS functions, but this calling
> convention requires it to be saved.

My bad -- I had misread the `ifndef` as an `ifdef`, which was why I was
confused as to why you'd save x18 when SCS was enabled, which is not
what's going on.

Thanks,
Mark.



More information about the linux-arm-kernel mailing list