[PATCH 4/4] arm64: ftrace: Add return address protection

Ard Biesheuvel ardb at kernel.org
Thu Dec 1 07:05:35 PST 2022


On Thu, 1 Dec 2022 at 15:40, Mark Rutland <mark.rutland at arm.com> wrote:
>
> On Thu, Dec 01, 2022 at 02:09:41PM +0100, Ard Biesheuvel wrote:
> > On Wed, 30 Nov 2022 at 18:45, Mark Rutland <mark.rutland at arm.com> wrote:
> > >
> > > On Wed, Nov 30, 2022 at 03:26:19PM +0100, Ard Biesheuvel wrote:
> > > > On Wed, 30 Nov 2022 at 15:04, Mark Rutland <mark.rutland at arm.com> wrote:
> > > > > On Tue, Nov 29, 2022 at 03:18:03PM +0100, Ard Biesheuvel wrote:
> > > > > > Use the newly added asm macros to protect and restore the return address
> > > > > > in the ftrace call wrappers, based on whichever method is active (PAC
> > > > > > and/or shadow call stack).
> > > > > >
> > > > > > If the graph tracer is in use, this covers both the return address *to*
> > > > > > the ftrace call site as well as the return address *at* the call site,
> > > > > > and the latter will either be restored in return_to_handler(), or before
> > > > > > returning to the call site.
> > > > > >
> > > > > > Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> > > > > > ---
> > > > > >  arch/arm64/kernel/entry-ftrace.S | 28 +++++++++++++++++++-
> > > > > >  1 file changed, 27 insertions(+), 1 deletion(-)
> > > > >
> > > > > As a heads-up, this code has recently changed quite significantly, and this
> > > > > won't apply to the version queued in arm64's for-next/{ftrace,core} branches.
> > > > >
> > > > > I had a direction of travel in mind with some changes for better stacktracing,
> > > > > which won't work with the approach here, so I'd prefer we do this a bit
> > > > > differently; more on that below.
> > > > >
> > > > > > diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
> > > > > > index 795344ab4ec45889..c744e4dd8c90a352 100644
> > > > > > --- a/arch/arm64/kernel/entry-ftrace.S
> > > > > > +++ b/arch/arm64/kernel/entry-ftrace.S
> > > > > > @@ -35,6 +35,11 @@
> > > > > >   * is missing from the LR and existing chain of frame records.
> > > > > >   */
> > > > > >       .macro  ftrace_regs_entry, allregs=0
> > > > > > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > > > > > +     protect_return_address x9
> > > > > > +#endif
> > > > > > +     protect_return_address x30
> > > > >
> > > > > I think if we're going to protect the callsite's original LR (x9 here), we
> > > > > should do that regardless of CONFIG_FUNCTION_GRAPH_TRACER; what matters is
> > > > > whether that's vulnerable rather than whether we intend to modify it, so I
> > > > > don't think it makes sene to protect it conditionally based on
> > > > > CONFIG_FUNCTION_GRAPH_TRACER.
> > > >
> > > > My reasoning was that if we are not going to return from it (in
> > > > return_to_handler()), we can rely on the interrupted function to
> > > > sign/authenticate it as usual. So the only reason for signing it here
> > > > is so that we can authenticate it in return_to_handler() if that
> > > > exists on the call path, removing a potentially vulnerable sequence
> > > > from that function.
> > >
> > > What I was trying to point out is that there is a window where this is spilled
> > > to the stack (and hence is potentially vulnerable) between
> > > ftrace_{caller,regs_caller}() and the end of ftrace_common().
> > >
> > > So if we don't protect this when CONFIG_FUNCTION_GRAPH_TRACER=n, it could be
> > > clobbered during that window (e.g. while function tracers are invoked),
> > > *before* we return back into the instrumented function and sign the
> > > (potentially already clobbered) value.
> >
> > Agreed.
> >
> > But to clarify, the intent of this series is not to add protection to
> > ftrace, the intent is to get rid of the gadgets from the ftrace code
> > that can be abused even if you don't use ftrace at all.
>
> Ok; sorry for missing that; I'll need to think a little harder.
>

You said it :-)

> > > Hence, my thinking is that we should sign this regardless of
> > > CONFIG_FUNCTION_GRAPH_TRACER to mitigate that case. I agree that we also want
> > > it to be signed while it's in the graph return stack (i.e. until the
> > > instrumented function returns back to return_to_handler()). In general, we
> > > should sign the value if it's going to be spilled to the stack.
> >
> > Sure, but it solves a different problem.
>
> Fair enough!
>
> I think we're agreed that something which solves both issues makes sense, even
> if that's not necessary for the gadgetisation issue specifically?
>

Of course.

So the issue we are talking about here is the fact that you might be
able to attack the ftrace infrastructure while it is being used so
that the function return from ftrace_common() is made to point
somewhere else. I agree that this is something we might want to
harden, and I also wonder whether we should perhaps insert three NOPs
instead of two, or teach the compiler to put its PACIASP right after
so that we can use BR instead of RET to perform the return.

But again, this is ground that I am currently not attempting to cover.

> > > > > I'm a bit worried this might confuse some ftrace code manipulating the return
> > > > > address (e.g. manipulation of the ftrace graph return stack), as I don't think
> > > > > that's all PAC-clean, and might need some modification.
> > > >
> > > > This is the reason for the xpaci instruction below.
> > >
> > > Unfortunately, that alone isn't sufficient.
> > >
> > > What I was alluding to is that this change means the ftrace graph return stack
> > > contains signed addresses, and other code doesn't expect that. For example,
> > > arm64's stacktrace code currently depends on the graph return stack containing
> > > plain pointers, and so that gets broken as of this patch when function graph
> > > tracing is enabled:
> > >
> > > | # uname -a
> > > | # Linux buildroot 6.1.0-rc7-00003-g44a67f0b8ac7 #1 SMP PREEMPT Wed Nov 30 17:19:38 GMT 2022 aarch64 GNU/Linux
> > > | # cat /proc/self/stack
> > > | [<0>] proc_pid_stack+0xc0/0x130
> > > | [<0>] proc_single_show+0x68/0x120
> > > | [<0>] seq_read_iter+0x16c/0x45c
> > > | [<0>] seq_read+0x98/0xd0
> > > | [<0>] vfs_read+0xc8/0x2c0
> > > | [<0>] ksys_read+0x78/0x110
> > > | [<0>] __arm64_sys_read+0x24/0x30
> > > | [<0>] invoke_syscall+0x50/0x120
> > > | [<0>] el0_svc_common.constprop.0+0xd4/0xf4
> > > | [<0>] do_el0_svc+0x34/0xd0
> > > | [<0>] el0_svc+0x2c/0x84
> > > | [<0>] el0t_64_sync_handler+0xf4/0x120
> > > | [<0>] el0t_64_sync+0x18c/0x190
> > > | # echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > > | # cat /proc/self/stack
> > > | [<0>] 0xf5f98000083dff40
> > > | [<0>] 0xd6b88000083e0f68
> > > | [<0>] 0x21ac800008381ad0
> > > | [<0>] 0xd0bc800008381e58
> > > | [<0>] 0x22b280000834bc28
> > > | [<0>] 0xf0ca80000834c5c8
> > > | [<0>] 0x299080000834c684
> > > | [<0>] 0xb1a1800008029cf0
> > > | [<0>] 0x9bd0800008029e94
> > > | [<0>] 0x1788800008029ee8
> > > | [<0>] 0xa08680000916dd5c
> > > | [<0>] el0t_64_sync_handler+0xf4/0x120
> > > | [<0>] el0t_64_sync+0x18c/0x190
> > >
> > > That's unfortunate (and would break RELIABLE_STACKTRACE, which we're slowly
> > > getting towards being able to implement), but it's simple enough to account for
> > > in the stacktrace code.
> > >
> >
> > Indeed. Those functions should just strip the PAC bits, no?
>
> For that case, yup. That was roughly what I meant about it being simple to deal
> with in the stacktrace code. :)
>

Right. So given that this is an issue for PAC but not for shadow call
stack, we might consider a shorter term fix where we push/pop these
addresses to the shadow call stack, and address the PAC clearing more
comprehensively once we get around to it.

> > > I have a fear that there are other cases where code tries to consume the graph
> > > return stack (or to match against entries within it), which would be similarly
> > > broken. I vaguely recall that we had issues of that shape in the past when we
> > > tried to adjust the reported PC value, and would need to go page that in to
> > > check that we don't open a similar issue here.
> >
> > OK
>
> FWIW, I'm happy to go audit that, I just wanted to make sure we didn't forget
> to do so, since it's not obvious that there are potential issues there.
>

Great.

> > > > > >       /* Restore the callsite's SP */
> > > > > >       add     sp, sp, #PT_REGS_SIZE + 16
> > > > > >
> > > > > > +     restore_return_address x9
> > > > > > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > > > > > +     /* compare the original return address with the actual one */
> > > > > > +     cmp     x10, x30
> > > > > > +     b.ne    0f
> > > > > > +
> > > > > > +     /*
> > > > > > +      * If they are the same, unprotect it now. If it was modified, it will
> > > > > > +      * be dealt with in return_to_handler() below.
> > > > > > +      */
> > > > > > +     restore_return_address x30
> > > > > > +0:
> > > > > > +#endif
> > > > > >       ret     x9
>
> > > > > As above, I'd prefer that we had ftrace_graph_func() fix things up so that we
> > > > > can unconditionally authenticate things here, which would be a bit stronger and
> > > > > simpler to reason about.
> > > > >
> > > >
> > > > I think having all in one place makes it much easier to reason about,
> > > > tbh. Adding additional handling of the PAC state as well as the shadow
> > > > call stack in ftrace_graph_func() seems much more fiddly to me.
> > >
> > > I appreciate that concern, but my intuition here is the inverse; I'd like to
> > > avoid the conditionality in the regular tracing path to make that clearly
> > > balanced and (from my perspective) easier to reason about.
> > >
> > > I'm happy if we have to do a bit more work in ftrace_graph_func() and
> > > return_to_handler() since those are already more special anyway.
> > >
> >
> > Fair enough. As long as the asm routines have a SCS pop or AUTIASP
> > between reloading x30 and returning to it, I don't have any problems
> > with that.
>
> Sure; I think that's workable. I have a rough shape in mind, so I'll have a go
> at that as an example and try to get back to you shortly.
>

Thanks.

> With that in mind, I think we should also fix up
> qcom_link_stack_sanitisation(), since that ends up creating a gadget of the form:
>
>         MOV     X30, Xn
>         RET
>
> ... and that can be fixed by leaving it to the compiler to save/restore x30,
> whereupon it should create a frame record and all the usual PAC goodness.
> Example patch below (reformatted into the usual arm64 inline asm style).
>
> Thanks,
> Mark.
>
> ---->8----
> diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c
> index bfce41c2a53b3..9fc54facf1ccb 100644
> --- a/arch/arm64/kernel/proton-pack.c
> +++ b/arch/arm64/kernel/proton-pack.c
> @@ -250,12 +250,13 @@ static noinstr void qcom_link_stack_sanitisation(void)
>  {
>         u64 tmp;
>
> -       asm volatile("mov       %0, x30         \n"
> -                    ".rept     16              \n"
> -                    "bl        . + 4           \n"
> -                    ".endr                     \n"
> -                    "mov       x30, %0         \n"
> -                    : "=&r" (tmp));
> +       asm volatile(
> +       "       .rept   16              \n"
> +       "       bl      . + 4           \n"
> +       "       .endr                   \n"
> +       : "=&r" (tmp)
> +       :
> +       : "x30");
>  }
>

Yeah, I'm sure that's the last one :-)



More information about the linux-arm-kernel mailing list