ftrace performance impact with different configuration

Thu Dec 29 11:21:25 EST 2011

On Thu, 2011-12-29 at 21:12 +0530, Rabin Vincent wrote:
> On Thu, Dec 29, 2011 at 14:08, Lei Wen <adrian.wenl at gmail.com> wrote:
> > 2. Seem dynamic ftrace also could involve some penalty for the running
> > system, although it patching the running kernel with nop stub...
> >
> > For the second item, is there anyone done some research before that
> > could zero the cost for the running system when the tracing is not
> > enabled yet?
> 
> One thing that needs to be fixed (for ARM) is that for the new-style
> mcounts, the nop that's currently being done is not really a nop -- it
> removes the function call, but there is still an unnecessary push/pop
> sequence.  This should be modified to have the push {lr} removed too.
> (Two instructions replaced instead of one.)

Unfortunately you can't do this, at least not when the kernel is
preemptible.

Say we have:

	push lr
	call mcount

then we convert it to:

	nop
	nop

The conversion to nop should not be an issue, and this is what would be
done when the system boots up. But then we enable tracing, some low
priority task could have been preempted after executing the first nop,
and we call stop machine to do the conversions (if no stop machine, then
lets just say a higher prio task is running while we do the
conversions). Then we add both the push lr and call back. But when that
lower priority task gets scheduled in again, it would have looked like
it ran:

	nop
	call mcount

Since the call to mcount requires that the lr was pushed, this process
will crash when the return is done and we never saved the lr.

If you don't like the push. the best thing you can do is convert to:

	jmp 1f
	call mcount
1:

This may not be as cheap as two nops, but it may be better than a push.

-- Steve