[RFC PATCH -next v2 0/4] arm64/ftrace: support dynamic trampoline

Mark Rutland mark.rutland at arm.com
Wed May 25 05:45:13 PDT 2022


On Thu, Apr 21, 2022 at 08:37:58AM -0400, Steven Rostedt wrote:
> On Thu, 21 Apr 2022 09:13:01 +0800
> "Wangshaobo (bobo)" <bobo.shaobowang at huawei.com> wrote:
> 
> > Not yet, Steve, ftrace_location() looks has no help to find a right 
> > rec->ip in our case,
> > 
> > ftrace_location() can find a right rec->ip when input ip is in the range 
> > between
> > 
> > sym+0 and sym+$end, but our question is how to  identify rec->ip from 
> > __mcount_loc,
> 
> Are you saying that the "ftrace location" is not between sym+0 and sym+$end?

IIUC yes -- this series as-is moves the call to the trampoline *before* sym+0.

Among other things that completely wrecks backtracing, so I'd *really* like to
avoid that (hance my suggested alternative).

> > this changed the patchable entry before bti to after in gcc:
> > 
> >     [1] https://reviews.llvm.org/D73680
> > 
> > gcc tells the place of first nop of the 5 NOPs when using 
> > -fpatchable-function-entry=5,3,
> > 
> > but not tells the first nop after bti, so we don't know how to adjust 
> > our rec->ip for ftrace.
> 
> OK, so I do not understand how the compiler is injecting bti with mcount
> calls, so I'll just walk away for now ;-)

When using BTI, the compiler has to drop a BTI *at* the function entry point
(i.e. sym+0) for any function that can be called indirectly, but can omit this
when the function is only directly called (which is the case for most functions
created via insterprocedural specialization, or for a number of static
functions).

Today, when we pass:

	-fpatchable-function-entry=2

... the compiler places 2 NOPs *after* any BTI, and records the location of the
first NOP. So the two cases we get are:

	__func_without_bti:
		NOP		<--- recorded location
		NOP

	__func_with_bti:
		BTI
		NOP		<--- recorded location
		NOP

... which works just fine, since either sym+0 or sym+4 are reasonable
locations for the patch-site to live.

However, if we were to pass:

	-fpatchable-function-entry=5,3

... the compiler places 3 NOPs *before* any BTI, and 2 NOPs *after* any BTI,
still recording the location of the first NOP. So in the two cases we get:

		NOP		<--- recorded location
		NOP
		NOP
	__func_without_bti:
		NOP
		NOP

		NOP		<--- recorded location
		NOP
		NOP
	__func_with_bti:
		BTI
		NOP
		NOP

... so where we want to patch one of the later nops to banch to a pre-function
NOP, we need to know whether or not the compiler generated a BTI. We can
discover discover that either by:

* Checking whether the recorded location is at sym+0 (no BTI) or sym+4 (BTI).

* Reading the instruction before the recorded location, and seeing if this is a
  BTI.

... and depending on how we handle thigns the two cases *might* need different
trampolines.

Thanks,
Mark.



More information about the linux-arm-kernel mailing list