[PATCH v4 6/6] perf tools: determine if LR is the return address

German Gomez german.gomez at arm.com
Fri Dec 17 03:57:10 PST 2021


Hi Mark,

Thanks for your review comments

On 15/12/2021 16:33, Mark Rutland wrote:
> Hi,
>
> On Wed, Dec 15, 2021 at 03:11:38PM +0000, German Gomez wrote:
>> From: Alexandre Truong <alexandre.truong at arm.com>
>>
>> On arm64 and frame pointer mode (e.g: perf record --callgraph fp),
>> use dwarf unwind info to check if the link register is the return
>> address in order to inject it to the frame pointer stack.
> This series looks good overall, but as a general note the commit messages are a
> bit hard to read because they jump into implementation details of the patch
> (i.e. the change the patch makes) before explaining the problem (i.e. what the
> patch is trying to solve).
>
> It would be nice to have a short introduction, e.g.

Thanks for the suggestion! I'll run through the logs to see if I can
improve them.

>
>   When unwinding using frame pointers on arm64, the return address of the
>   current leaf function may be missed. The return address of a leaf function
>   may live in the LR and/or a frame record (and the location can change within
>   a function), so it is necessary to use DWARF to identify where to look for
>   the return address at any given point during a function.
>
>   For example:
>
>   unsigned long foo(unsigned long i)
>   {
>           i += 2;
> 	  i += 5;
>   }
>
>   ... could be compiled as:
>
>   foo:
>   	// return addr in LR
>   	add	x0, x0, #2
> 	// return addr in LR
> 	stp	x29, x30, [SP, #-16]!
> 	// return addr in LR
> 	mov	x29, sp
> 	// return addr in LR *and* frame record
> 	add	x0, x0, #5
> 	// return addr in LR *and* frame record
> 	ldp	x29, x30, [sp], #16
> 	// return addr in LR
> 	ret
>
>> Write the following application:
>>
>> 	int a = 10;
>>
>> 	void f2(void)
>> 	{
>> 		for (int i = 0; i < 1000000; i++)
>> 			a *= a;
>> 	}
>>
>> 	void f1()
>> 	{
>> 		for (int i = 0; i < 10; i++)
>> 			f2();
>> 	}
>>
>> 	int main(void)
>> 	{
>> 		f1();
>> 		return 0;
>> 	}
>>
>> with the following compilation flags:
>>         gcc -fno-omit-frame-pointer -fno-inline -O2
>>
>> The compiler omits the frame pointer for f2 on arm. This is a problem
>> with any leaf call, for example an application with many different
>> calls to malloc() would always omit the calling frame, even if it
>> can be determined.
> I think the wording here is slightly misleading. For f2, the compiler *doesn't
> create a frame record*, but the frame pointer (to the caller's frame record)
> remains and is not omitted.
>
> Also, I think it's woth noting (as per the example I gave above) this applies
> to *any* function which is the current leaf function, regardless of whether
> that function creates a frame record at some point. For example, if `f1` is
> interrupted before it creates its own frame record (or after it destroys the
> frame record), the FP will point at the record created by `main` (containing
> the caller of main), and `main` itself will be missing from the unwind as it
> will only exist in the LR.

I see! I hadn't considered this. I guess it's not as likely to happen
but it's worth noting indeed.

>
>> 	./perf record --call-graph fp ./a.out
>> 	./perf report
>>
>> currently gives the following stack:
>>
>> 0xffffea52f361
>> _start
>> __libc_start_main
>> main
>> f2
>>
>> After this change, perf report correctly shows f1() calling f2(),
>> even though it was missing from the frame pointer unwind:
>>
>> 	./perf report
>>
>> 0xffffea52f361
>> _start
>> __libc_start_main
>> main
>> f1
>> f2
>>
>> Signed-off-by: Alexandre Truong <alexandre.truong at arm.com>
>> Signed-off-by: German Gomez <german.gomez at arm.com>
>> ---
>>  tools/perf/util/Build                         |  1 +
>>  .../util/arm64-frame-pointer-unwind-support.c | 63 +++++++++++++++++++
>>  .../util/arm64-frame-pointer-unwind-support.h | 10 +++
>>  tools/perf/util/machine.c                     | 19 ++++--
>>  tools/perf/util/machine.h                     |  1 +
>>  5 files changed, 89 insertions(+), 5 deletions(-)
>>  create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.c
>>  create mode 100644 tools/perf/util/arm64-frame-pointer-unwind-support.h
>>
>> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
>> index 2e5bfbb69960..03d4c647bd86 100644
>> --- a/tools/perf/util/Build
>> +++ b/tools/perf/util/Build
>> @@ -1,3 +1,4 @@
>> +perf-y += arm64-frame-pointer-unwind-support.o
>>  perf-y += annotate.o
>>  perf-y += block-info.o
>>  perf-y += block-range.o
>> diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.c b/tools/perf/util/arm64-frame-pointer-unwind-support.c
>> new file mode 100644
>> index 000000000000..4f5ecf51ed38
>> --- /dev/null
>> +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.c
>> @@ -0,0 +1,63 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +#include "arm64-frame-pointer-unwind-support.h"
>> +#include "callchain.h"
>> +#include "event.h"
>> +#include "perf_regs.h" // SMPL_REG_MASK
>> +#include "unwind.h"
>> +
>> +#define perf_event_arm_regs perf_event_arm64_regs
>> +#include "../arch/arm64/include/uapi/asm/perf_regs.h"
>> +#undef perf_event_arm_regs
>> +
>> +struct entries {
>> +	u64 stack[2];
>> +	size_t length;
>> +};
>> +
>> +static bool get_leaf_frame_caller_enabled(struct perf_sample *sample)
>> +{
>> +	return callchain_param.record_mode == CALLCHAIN_FP && sample->user_regs.regs
>> +		&& sample->user_regs.mask & SMPL_REG_MASK(PERF_REG_ARM64_LR);
>> +}
>> +
>> +static int add_entry(struct unwind_entry *entry, void *arg)
>> +{
>> +	struct entries *entries = arg;
>> +
>> +	entries->stack[entries->length++] = entry->ip;
>> +	return 0;
>> +}
>> +
>> +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread, int usr_idx)
>> +{
>> +	int ret;
>> +	struct entries entries = {};
>> +	struct regs_dump old_regs = sample->user_regs;
>> +
>> +	if (!get_leaf_frame_caller_enabled(sample))
>> +		return 0;
>> +
>> +	/*
>> +	 * If PC and SP are not recorded, get the value of PC from the stack
>> +	 * and set its mask. SP is not used when doing the unwinding but it
>> +	 * still needs to be set to prevent failures.
>> +	 */
> To prevent failures where? Is this something libunwind requires?

Admittedly I haven't look very deep into libunwind, but SP seems to go
ignored when getting the last 2 entries only, so here we set it to any
value.

Thanks,
German



More information about the linux-arm-kernel mailing list