[PATCH v3 0/8] arm64/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS

Chen-Yu Tsai wenst at chromium.org
Thu Feb 23 01:28:43 PST 2023


On Mon, Jan 23, 2023 at 01:45:55PM +0000, Mark Rutland wrote:
> Hi Catalin, Steve,
> 
> I'm not sure how we want to merge this, so I've moved the core ftrace
> patch to the start of the series so that it can more easily be placed on
> a stable branch if we want that to go via the ftrace tree and the rest
> to go via arm64.
> 
> This is cleanly pasing the ftrace selftests from v6.2-rc3 (results in
> the final patch).
> 
> Aside from that, usual cover letter below.
> 
> This series adds a new DYNAMIC_FTRACE_WITH_CALL_OPS mechanism, and
> enables support for this on arm64. This significantly reduces the
> overhead of tracing when a callsite/tracee has a single associated
> tracer, avoids a number of issues that make it undesireably and
> infeasible to use dynamically-allocated trampolines (e.g. branch range
> limitations), and makes it possible to implement support for
> DYNAMIC_FTRACE_WITH_DIRECT_CALLS in future.
> 
> The main idea is to give each ftrace callsite an associated pointer to
> an ftrace_ops. The architecture's ftrace_caller trampoline can recover
> the ops pointer and invoke ops->func from this without needing to use
> ftrace_ops_list_func, which has to iterate through all registered ops.
> 
> To make this work, we use -fpatchable-function-entry=M,N, there N NOPs
> are placed before the function entry point. On arm64 NOPs are always 4
> bytes, so by allocating 2 per-function NOPs, we have enough space to
> place a 64-bit value. So that we can manipulate the pointer atomically,
> we need to align instrumented functions to at least 8 bytes, which we
> can ensure using -falign-functions=8.

Does this work all the time? Or is it influenced by other Kconfig
options?

I'm getting random misaligned patch-site warnings like the following:

Misaligned patch-site gic_handle_irq+0x0/0x12c
Misaligned patch-site __traceiter_initcall_level+0x0/0x60
Misaligned patch-site __traceiter_initcall_start+0x0/0x60
Misaligned patch-site __traceiter_initcall_finish+0x0/0x68
Misaligned patch-site do_one_initcall+0x0/0x300
Misaligned patch-site do_one_initcall+0x2b0/0x300
Misaligned patch-site match_dev_by_label+0x0/0x50
Misaligned patch-site match_dev_by_uuid+0x0/0x48
Misaligned patch-site wait_for_initramfs+0x0/0x68
Misaligned patch-site panic_show_mem+0x0/0x88
Misaligned patch-site 0xffffffd3b4fef074
[...]
(I assume the unresolved symbol(s) are from modules.)

The warnings were seen on next-20230223 and many versions before, with
Debian's GCC 12.2 cross compile toolchain. I also tried next-20230223
with Linaro's toolchains gcc-linaro-12.2.1-2023.01-x86_64_aarch64-linux-gnu
and gcc-linaro-13.0.0-2022.11-x86_64_aarch64-linux-gnu and the warnings
appeared as well.

Checking panic_show_mem in various places from the Debian GCC's build:

$ aarch64-linux-gnu-nm init/initramfs.o | grep panic_show_mem
0000000000000070 t panic_show_mem
$ aarch64-linux-gnu-nm init/built-in.a | grep panic_show_mem
0000000000000070 t panic_show_mem
$ aarch64-linux-gnu-nm built-in.a | grep panic_show_mem
0000000000000070 t panic_show_mem
$ aarch64-linux-gnu-nm vmlinux.a | grep panic_show_mem
0000000000000070 t panic_show_mem
$ aarch64-linux-gnu-nm vmlinux.o | grep panic_show_mem
0000000000001534 t panic_show_mem
$ aarch64-linux-gnu-nm vmlinux | grep panic_show_mem
ffffffc0080158dc t panic_show_mem

Looks like individual object files do have functions aligned at 8-byte
boundaries, but when all the object files are collected and linked
together into vmlinux.o, the higher alignment gets dropped and some
functions end up on 4-byte boundaries.


Regards
ChenYu

> 
> Each callsite ends up looking like:
> 
> 	# Aligned to 8 bytes
> 	func - 8:
> 		< pointer to ops >
> 	func:
> 		BTI		// optional
> 		MOV	X9, LR
> 		NOP		// patched to `BL ftrace_caller`
> 	func_body:
> 
> When entering ftrace_caller, the LR points at func_body, and the
> ftrace_ops can be recovered at a negative offset from this the LR value:
> 
> 	BIC	<tmp>, LR, 0x7		// Align down (skips BTI)
> 	LDR	<tmp>, [<tmp>, #-16]	// load ops pointer
> 
> The ftrace_ops::func (and any other ftrace_ops fields) can then be
> recovered from this pointer to the ops.
> 
> The first three patches enable the function alignment, working around
> cases where GCC drops alignment for cold functions or when building with
> '-Os'.
> 
> The final four patches implement support for
> DYNAMIC_FTRACE_WITH_CALL_OPS on arm64. As noted in the final patch, this
> results in a significant reduction in overhead:
> 
>   Before this series:
> 
>   Number of tracers     || Total time  | Per-call average time (ns)
>   Relevant | Irrelevant || (ns)        | Total        | Overhead
>   =========+============++=============+==============+============
>          0 |          0 ||      94,583 |         0.95 |           -
>          0 |          1 ||      93,709 |         0.94 |           -
>          0 |          2 ||      93,666 |         0.94 |           -
>          0 |         10 ||      93,709 |         0.94 |           -
>          0 |        100 ||      93,792 |         0.94 |           -
>   ---------+------------++-------------+--------------+------------
>          1 |          1 ||   6,467,833 |        64.68 |       63.73
>          1 |          2 ||   7,509,708 |        75.10 |       74.15
>          1 |         10 ||  23,786,792 |       237.87 |      236.92
>          1 |        100 || 106,432,500 |     1,064.43 |     1063.38
>   ---------+------------++-------------+--------------+------------
>          1 |          0 ||   1,431,875 |        14.32 |       13.37
>          2 |          0 ||   6,456,334 |        64.56 |       63.62
>         10 |          0 ||  22,717,000 |       227.17 |      226.22
>        100 |          0 || 103,293,667 |      1032.94 |     1031.99
>   ---------+------------++-------------+--------------+--------------
> 
>   Note: per-call overhead is estiamated relative to the baseline case
>   with 0 relevant tracers and 0 irrelevant tracers.
> 
>   After this series:
> 
>   Number of tracers     || Total time  | Per-call average time (ns)
>   Relevant | Irrelevant || (ns)        | Total        | Overhead
>   =========+============++=============+==============+============
>          0 |          0 ||      94,541 |         0.95 |           -
>          0 |          1 ||      93,666 |         0.94 |           -
>          0 |          2 ||      93,709 |         0.94 |           -
>          0 |         10 ||      93,667 |         0.94 |           -
>          0 |        100 ||      93,792 |         0.94 |           -
>   ---------+------------++-------------+--------------+------------
>          1 |          1 ||     281,000 |         2.81 |        1.86
>          1 |          2 ||     281,042 |         2.81 |        1.87
>          1 |         10 ||     280,958 |         2.81 |        1.86
>          1 |        100 ||     281,250 |         2.81 |        1.87
>   ---------+------------++-------------+--------------+------------
>          1 |          0 ||     280,959 |         2.81 |        1.86
>          2 |          0 ||   6,502,708 |        65.03 |       64.08
>         10 |          0 ||  18,681,209 |       186.81 |      185.87
>        100 |          0 || 103,550,458 |     1,035.50 |     1034.56
>   ---------+------------++-------------+--------------+------------
> 
>   Note: per-call overhead is estiamated relative to the baseline case
>   with 0 relevant tracers and 0 irrelevant tracers.
> 
> 
> This version of the series can be found in my kernel.org git repo:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git
> 
> Tagged as:
> 
>   arm64-ftrace-per-callsite-ops-20230113
> 
> Since v1 [1]:
> * Fold in Ack from Rafael
> * Update comments/commits with description of the GCC issue
> * Move the cold attribute changes to compiler_types.h
> * Drop the unnecessary changes to the weak attribute
> * Move declaration of ftrace_ops earlier
> * Clean up and improve commit messages
> * Regenerate statistics on misaligned text symbols
> 
> Since v2 [2]:
> * Fold in Steve's Reviewed-by tag
> * Move core ftrace patch to the start of the series
> * Add ftrace selftest reults to final patch
> * Use FUNCTION_ALIGNMENT_4B by default
> * Fix commit message typos
> 
> [1] https://lore.kernel.org/linux-arm-kernel/20230109135828.879136-1-mark.rutland@arm.com/
> [2] https://lore.kernel.org/linux-arm-kernel/20230113180355.2930042-1-mark.rutland@arm.com/
> 
> Thanks,
> Mark.
> 
> Mark Rutland (8):
>   ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
>   Compiler attributes: GCC cold function alignment workarounds
>   ACPI: Don't build ACPICA with '-Os'
>   arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT
>   arm64: insn: Add helpers for BTI
>   arm64: patching: Add aarch64_insn_write_literal_u64()
>   arm64: ftrace: Update stale comment
>   arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
> 
>  arch/arm64/Kconfig                  |   4 +
>  arch/arm64/Makefile                 |   5 +-
>  arch/arm64/include/asm/ftrace.h     |  15 +--
>  arch/arm64/include/asm/insn.h       |   1 +
>  arch/arm64/include/asm/linkage.h    |   4 +-
>  arch/arm64/include/asm/patching.h   |   2 +
>  arch/arm64/kernel/asm-offsets.c     |   4 +
>  arch/arm64/kernel/entry-ftrace.S    |  32 +++++-
>  arch/arm64/kernel/ftrace.c          | 158 +++++++++++++++++++++++++++-
>  arch/arm64/kernel/patching.c        |  17 +++
>  drivers/acpi/acpica/Makefile        |   2 +-
>  include/linux/compiler_attributes.h |   6 --
>  include/linux/compiler_types.h      |  27 +++++
>  include/linux/ftrace.h              |  18 +++-
>  kernel/exit.c                       |   9 +-
>  kernel/trace/Kconfig                |   7 ++
>  kernel/trace/ftrace.c               | 109 ++++++++++++++++++-
>  17 files changed, 380 insertions(+), 40 deletions(-)
> 
> -- 
> 2.30.2
> 
> 



More information about the linux-arm-kernel mailing list