[RFC] arm64: ftrace with regs for livepatch support

Li Bin huawei.libin at huawei.com
Sat Dec 26 01:28:07 PST 2015



on 2015/11/20 19:47, AKASHI Takahiro wrote:
> In this RFC, I'd like to describe and discuss some issues on adding ftrace/
> livepatch support on arm64 before actually submitting patches. In fact,
> porting livepatch is not a complicated task, but adding "ftrace with
> regs(CONFIG_DYNAMIC_FTRACE_WITH_REGS)" which livepatch heavily relies on
> is a matter.
> (There is another discussion about "arch-independent livepatch" in LKML.)
>
> Under "ftrace with regs", a ftrace helper function (ftrace_regs_caller)
> will be called with cpu registers (struct pt_regs_t) at the beginning of
> a function if tracing is enabled on the function. Livepatch utilizes this
> argument to replace PC and jump back into a new (patched) function.
> (Please note that this feature will also be used for ftrace-based kprobes.)
>
> On arm64, there is no template for a function prologue, and "instruction
> scheduling" may mix it with a function body. So a helper function, which
> is inserted by gcc's "-pg" option, cannot (at least potentially) recognize
> correct values of registers because some may have already been overwritten
> at that point.
>
> Instead, X86 uses gcc's "-mfentry" option, which inserts "call _mcount" as
> the first instruction of a function, to implement "ftrace with regs".
> As this option is arch-specific, after discussions with toolchain folks,
> we are proposing a new arch-neutral option, "-fprolog-pad=N"[1].
> This option inserts N nop instructions before a function prologue so that
> any architecture can utilize it to replace nops with whatever instruction
> sequence they want later on when required.
> (I assume that nop is very cheap in terms of performance impact.)
>
> First, let me explain how we can implement "ftrace with regs", or more
> specifically, ftrace_make_call() and ftrace_make_nop() as well as how
> inserted instruction sequences look like. Implementing ftrace_regs_caller
> is quite straightforward, we don't have to care (at least, in this RFC).
>
> 1) instruction sequence
> Unlike x86, we have to preserve link register(x30) explicitly on arm64 since
> a ftrace help function will be invoked before a function prologue. so we
> need a few, not one, instructions here. Two possible ways:
>
>  (a) stp x29, x30, [sp, #-16]!
>      mov x29, sp
>      bl <mcount>
>      ldp x29, x30, [sp], #16
>      <function prologue>
>      ...
>
>  (b) mov x9, x30
>      bl <mcount>
>      mov x30, x9
>      <function prologue>
>      ...
>
> (a) complies with a normal calling convention.
> (b) is Li Bin's idea in his old patch. While (b) can save some memory
> accesses by using a scratch register(x9 in this example), we have no way
> to recover an actual value for this register.
>
>       Q#1. Which approach should we take here?
>
>
> 2) replacing an instruction sequence
>    (This issue is orthogonal to Q#1.)
>
> Replacing can happen anytime, so we have to do it (without any locking) in
> such a safe way that any task either calls a helper or doesn't call it, but
> never runs in any intermediate state.
>
> Again here, two possible ways:
>
>   (a) initialize the code in the shape of (A') at boot time,
>             (B) -> (B') -> (A')
>       then switching to (A) or (A')
>   (b) take a few steps each time. For example,
>       to enable tracing,
>             (B) -> (B') -> (A') -> (A)
>       to disable tracing,
>             (A) -> (A') -> (B') -> (A)
>       Obviously, we need cache flushing/invalidation and barriers between.
>
>     (A)                                (A')
>         stp x29, x30, [sp, #-16]!           b 1f
>         mov x29, sp                         mov x29, sp
>         bl <_mcount>                        bl <_mcount>
>         ldp x29, x30, [sp], #16             ld x29, x30, [sp], #16
>                                          1:
>         <function prologue>
>         <function body>
>         ...
>
>     (B)                                (B')
>         nop                                 b 1f
>         nop                                 nop
>         nop                                 nop
>         nop                                 nop
>                                          1:
>         <function prologue>
>         <function body>
>         ...
>

Hi takahiro,
This method can not guarantee the correctness of replacing the multi instrucions
from  (A') to (B') or from (B') to (A'), even if under kstop_machine especially for
preemptable kernel or NMI context (which will be supported on arm64 in future).
Right?

Thanks,
Li Bin

> (a) is much simpler, but (b) has less performance penalty(?) when tracing
> is disabled. I'm afraid that I might simplify the issue too much.
>
>        Q#2. Which one is more preferable?
>
>
> [1] https://gcc.gnu.org/ml/gcc/2015-05/msg00267.html, and
>     https://gcc.gnu.org/ml/gcc/2015-10/msg00090.html
>
>
> Thanks,
> -Takahiro AKASHI
>
> .
>





More information about the linux-arm-kernel mailing list