[PATCH] kasan: arm64: support specialized outlined tag mismatch checks
Peter Collingbourne
pcc at google.com
Thu Nov 5 20:12:56 EST 2020
On Tue, Nov 3, 2020 at 7:28 AM Mark Rutland <mark.rutland at arm.com> wrote:
>
> Hi Peter,
>
> On Thu, Oct 29, 2020 at 01:59:44PM -0700, Peter Collingbourne wrote:
> > Outlined checks require a new runtime function with a custom calling
> > convention. Add this function to arch/arm64/lib.
>
> [...]
>
> > +/*
> > + * Report a tag mismatch detected by tag-based KASAN.
> > + *
> > + * This function has a custom calling convention in order to minimize the sizes
> > + * of the compiler-generated thunks that call it. All registers except for x16
> > + * and x17 must be preserved. This includes x0 and x1 which are used by the
> > + * caller to pass arguments, and x29 (the frame pointer register). In order to
> > + * allow these registers to be restored the caller spills them to stack. The 256
> > + * bytes of stack space allocated by the caller must be deallocated on return.
>
> This is a bit difficult to follow.
>
> Am I right in understanding that the caller has pre-allocated 256 bytes
> on the stack by decrementing the SP, yet it's up to the callee to undo
> that? That seems bizarre to me, and it would be much easier (and more
> generally useful) for the caller to leave it to the trampoline to
> allocate/free its own space.
Yes, that's how it works. The goal was to reduce code size costs,
since otherwise you would need setup and teardown code in each
trampoline together with unwind info which will not necessarily be
deduplicated by the linker. It's probably easiest to think of the
trampoline as the "head" of the tag mismatch handling function, while
the runtime part is the "tail". From that perspective everything is
balanced.
For reference here is an example of a trampoline function for the kernel:
__hwasan_check_x1_67043328:
sbfx x16, x1, #4, #52
ldrb w16, [x9, x16]
cmp x16, x1, lsr #56
b.ne .Ltmp5
.Ltmp6:
ret
.Ltmp5:
lsr x16, x1, #56
cmp x16, #255
b.eq .Ltmp6
stp x0, x1, [sp, #-256]!
stp x29, x30, [sp, #232]
mov x0, x1
mov x1, #0
b __hwasan_tag_mismatch
> > + *
> > + * This function takes care of transitioning to the standard AAPCS calling
> > + * convention and calls the C function kasan_tag_mismatch to report the error.
> > + *
> > + * Parameters:
> > + * x0 - the fault address
> > + * x1 - an encoded description of the faulting access
> > + */
> > +SYM_FUNC_START(__hwasan_tag_mismatch)
>
> For non-AAPCS code you should use SYM_CODE_*() rather than SYM_FUNC_*(),
> and place any BTI instructions necessary manually.
Okay. Since the trampolines may appear in modules I guess it's
possible for the function to be called via a module PLT so a BTI
instruction is required.
If I am reading the code in arch/arm64/kernel/module-plts.c correctly
it looks like the module PLTs may only clobber x16, correct? That
works for me.
> We should probably add SYM_NONSTD_FUNC_*() helpers for trampolines, as
> it looks like we didn't add BTIs into the ftrace trampolines (and module
> PLTs can get there via indirect branches). I'll put together a
> point-fix for ftrace.
>
> > + add x29, sp, #232
>
> Where has this offset come from? Is there already a frame record created
> by the caller? If so, that should be mentioned in the description of the
> calling convention. If not, I don't think this code works.
Yes, the frame record is created by the caller. The description does
mention that x29 is saved but not x30 which appears after it. I will
update the description.
> > + stp x2, x3, [sp, #16]
> > + stp x4, x5, [sp, #32]
> > + stp x6, x7, [sp, #48]
> > + stp x8, x9, [sp, #64]
> > + stp x10, x11, [sp, #80]
> > + stp x12, x13, [sp, #96]
> > + stp x14, x15, [sp, #112]
>
> Is there a reason this starts at an offset of 16?
Because the caller already saved x0 and x1 at offset 0.
> For offsets please use (16 * n) like we do in entry.S, e.g.
>
> | stp x2, x3, [sp, #16 * 0]
> | stp x4, x5, [sp, #16 * 1]
> | stp x4, x5, [sp, #16 * 2]
>
> ... as it makes the relationship clearer.
Will do.
> > +#ifndef CONFIG_SHADOW_CALL_STACK
> > + str x18, [sp, #128]
> > +#endif
>
> This doesn't look right to me. The shadow call stack should remain
> balanced across a call, so why do you need to save x18? any manipulation
> within kasan_tag_mismatch should be balanced.
>
> Did you mean to save the return address to the shadow stack, rather
> than saving the shadow stack pointer?
This is to handle the case where SCS is disabled. In this case, x18
becomes a temporary register for AAPCS functions, but this calling
convention requires it to be saved.
IIRC there was a proposal to always reserve x18 in the kernel
regardless of whether SCS was enabled. If that were adopted we would
be able to drop the conditional x18 save/restore here.
> > + mov x2, x30
> > + bl kasan_tag_mismatch
>
> We didn't preserve x0 and x1 above, so they can be clobbered here, but
> the comment said we needed to preserve them.
They were saved by the caller (out of necessity, because x0 and x1 are
used to pass arguments).
> > + ldp x29, x30, [sp, #232]
>
> This doesn't look right. This function didn't save x29/x30 to the stack,
> so the only way this could work is if the caller had a redundant frame
> record pointing to itself.
The frame record was created by the caller (before this function
updated x29, so the chain of frame records is maintained).
> > +#ifndef CONFIG_SHADOW_CALL_STACK
> > + ldr x18, [sp, #128]
> > +#endif
>
> As above, this doesn't look right.
Answered above.
> > + ldp x14, x15, [sp, #112]
> > + ldp x12, x13, [sp, #96]
> > + ldp x10, x11, [sp, #80]
> > + ldp x8, x9, [sp, #64]
> > + ldp x6, x7, [sp, #48]
> > + ldp x4, x5, [sp, #32]
> > + ldp x2, x3, [sp, #16]
> > + ldp x0, x1, [sp], #256
>
> As above, we didn't save x0 and x1 -- did the caller do that?
Yes.
> Likewise, did the caller allocate 256 bytes that we have to deallocate?
Yes, as mentioned in the description above.
Peter
More information about the linux-arm-kernel
mailing list