Performance impact of CONFIG_FUNCTION_TRACER

Steven Rostedt rostedt at goodmis.org
Tue Jul 5 07:39:01 PDT 2022


On Tue, 5 Jul 2022 12:54:16 +0200
Sascha Hauer <sha at pengutronix.de> wrote:

> Hi,
> 
> I ran some lmbench subtests on a ARMv7 machine (NXP i.MX6q) with and
> without CONFIG_FUNCTION_TRACER enabled (with CONFIG_DYNAMIC_FTRACE
> enabled and no tracing active), see below. The Kconfig help text of this
> option reads as:
> 
> > If it's runtime disabled (the bootup default), then the overhead of
> > the instructions is very small and not measurable even in
> > micro-benchmarks.  

Well, this is true for x86 ;-)

> 
> In my tests the overhead is small, but it surely exists and is
> measurable at least on ARMv7 machines. Is this expected? Should the help
> text be rephrased a little less optimistic?

You mean "(but may vary by architecture)"

As I believe due to using a link register for function calls, ARM
requires adding two 4 byte nops to every function where as x86 only
adds a single 5 byte nop.

Although nops are very fast (they should not be processed in the CPU's
pipe line, but I don't know if that's true for every arch). It also
affects instruction cache misses, as adding 8 bytes around the code
will cause more cache misses than when they do not exist.

Also, there's some configurations that use the old mcount that does add
some more code to handle the mcount case.

So if this is just to have us change the kconfig, I'm happy to do that.

-- Steve



More information about the linux-arm-kernel mailing list