ftrace performance impact with different configuration

Philippe Rétornaz philippe.retornaz at epfl.ch
Fri Dec 30 08:07:11 EST 2011


Le jeudi 29 décembre 2011 11:21:25 Steven Rostedt a écrit :
> On Thu, 2011-12-29 at 21:12 +0530, Rabin Vincent wrote:
> > On Thu, Dec 29, 2011 at 14:08, Lei Wen <adrian.wenl at gmail.com> wrote:
> > > 2. Seem dynamic ftrace also could involve some penalty for the
> > > running
> > > system, although it patching the running kernel with nop stub...
> > > 
> > > For the second item, is there anyone done some research before that
> > > could zero the cost for the running system when the tracing is not
> > > enabled yet?
> > 
> > One thing that needs to be fixed (for ARM) is that for the new-style
> > mcounts, the nop that's currently being done is not really a nop -- it
> > removes the function call, but there is still an unnecessary push/pop
> > sequence.  This should be modified to have the push {lr} removed too.
> > (Two instructions replaced instead of one.)
> 
> Unfortunately you can't do this, at least not when the kernel is
> preemptible.
> 
> Say we have:
> 
> 	push lr
> 	call mcount
> 
> then we convert it to:
> 
> 	nop
> 	nop
> 
> The conversion to nop should not be an issue, and this is what would be
> done when the system boots up. But then we enable tracing, some low
> priority task could have been preempted after executing the first nop,
> and we call stop machine to do the conversions (if no stop machine, then
> lets just say a higher prio task is running while we do the
> conversions). Then we add both the push lr and call back. But when that
> lower priority task gets scheduled in again, it would have looked like
> it ran:
> 
> 	nop
> 	call mcount
> 
> Since the call to mcount requires that the lr was pushed, this process
> will crash when the return is done and we never saved the lr.
> 
> If you don't like the push. the best thing you can do is convert to:
> 
> 	jmp 1f
> 	call mcount
> 1:
> 
> This may not be as cheap as two nops, but it may be better than a push.

Sorry about being a bit naive, but why it is not possible to do it in two 
steps ?
call stop_machine to put the jmp which skip the call to mcount
Then wait until all tasks hits schedule() (synchronize_sched() ?)
Then modify both instructions to put in place the two nops since we know that 
nobody is calling mcount.

Thanks,

Philippe





More information about the linux-arm-kernel mailing list