[PATCH v6 3/6] arm64: ftrace: fix a stack tracer's output under function graph tracer
AKASHI Takahiro
takahiro.akashi at linaro.org
Wed Nov 25 19:05:38 PST 2015
Jungseok,
On 11/25/2015 08:48 PM, Jungseok Lee wrote:
> On Nov 25, 2015, at 2:29 PM, AKASHI Takahiro wrote:
>> On 11/24/2015 10:37 PM, Jungseok Lee wrote:
>>> On Nov 18, 2015, at 3:43 PM, AKASHI Takahiro wrote:
>>>
>>> Hi Akashi,
>>>
>>>> Function graph tracer modifies a return address (LR) in a stack frame
>>>> to hook a function return. This will result in many useless entries
>>>> (return_to_handler) showing up in
>>>> a) a stack tracer's output
>>>> b) perf call graph (with perf record -g)
>>>> c) dump_backtrace (at panic et al.)
>>>>
>>>> For example, in case of a),
>>>> $ echo function_graph > /sys/kernel/debug/tracing/current_tracer
>>>> $ echo 1 > /proc/sys/kernel/stack_trace_enabled
>>>> $ cat /sys/kernel/debug/tracing/stack_trace
>>>> Depth Size Location (54 entries)
>>>> ----- ---- --------
>>>> 0) 4504 16 gic_raise_softirq+0x28/0x150
>>>> 1) 4488 80 smp_cross_call+0x38/0xb8
>>>> 2) 4408 48 return_to_handler+0x0/0x40
>>>> 3) 4360 32 return_to_handler+0x0/0x40
>>>> ...
>>>>
>>>> In case of b),
>>>> $ echo function_graph > /sys/kernel/debug/tracing/current_tracer
>>>> $ perf record -e mem:XXX:x -ag -- sleep 10
>>>> $ perf report
>>>> ...
>>>> | | |--0.22%-- 0x550f8
>>>> | | | 0x10888
>>>> | | | el0_svc_naked
>>>> | | | sys_openat
>>>> | | | return_to_handler
>>>> | | | return_to_handler
>>>> ...
>>>>
>>>> In case of c),
>>>> $ echo function_graph > /sys/kernel/debug/tracing/current_tracer
>>>> $ echo c > /proc/sysrq-trigger
>>>> ...
>>>> Call trace:
>>>> [<ffffffc00044d3ac>] sysrq_handle_crash+0x24/0x30
>>>> [<ffffffc000092250>] return_to_handler+0x0/0x40
>>>> [<ffffffc000092250>] return_to_handler+0x0/0x40
>>>> ...
>>>>
>>>> This patch replaces such entries with real addresses preserved in
>>>> current->ret_stack[] at unwind_frame(). This way, we can cover all
>>>> the cases.
>>>
>>> I've observed a strange behavior when playing with case c). Call trace
>>> is as follows when function_graph is not used.
>>>
>>> Call trace:
>>> [<fffffe00003dc738>] sysrq_handle_crash+0x24/0x30 <- (1)
>>> [<fffffe00003dd2ac>] __handle_sysrq+0x128/0x19c <- (2)
>>> [<fffffe00003dd730>] write_sysrq_trigger+0x60/0x74
>>> [<fffffe0000249fc4>] proc_reg_write+0x84/0xc0
>>> [<fffffe00001f2638>] __vfs_write+0x44/0x104 <- (3)
>>> [<fffffe00001f2e60>] vfs_write+0x98/0x1a8
>>> [<fffffe00001f3730>] SyS_write+0x50/0xb0
>>> [<fffffe00000939ec>] el0_svc_naked+0x20/0x28 <- (4)
>>>
>>> When function_graph is set, some entries, such as do_mem_abort, are added
>>> between (1) and (2). In addition, entries from (3) to (4) are not printed
>>> out. As tracking down elements of ret_stack[], I realize dump_backtrace()
>>> has been terminated before reaching to ret_stack[0]. Have you seen this
>>> kind of behavior? I believe push & pop operations work correctly.
>>>
>>> Please correct me if I'm wrong.
>>
>> Oops, I mis-interpreted the result output.
>> You are right.
>> This can happen because the original dump_backtrace() wants to trace a stack
>> from a function where an exception has taken place, sysrq_handle_crash(), while
>> ret_stack[curr_ret_stack] doesn't point to that function, but the very top of
>> traced functions in callchains, that is, __do_kernel_fault in your case, probably.
>
> Yes, __do_kernel_fault.
>
>> So it results in replacing entries of return_to_handler to wrong function addresses.
>>
>> A fixup! patch attached below (informative only) fixes this issue by once tracing
>> all the functions on a stack, but preventing a top few ones from being printed.
>> But there is a tricky thing here: we have to use 'regs->pc' instead of frame.pc
>> as a trapped function because, as I've already mentioned before, we always miss
>> the function when walking through a stack from an exception handler to functions
>> in a thread context.
>> (Please note that we will introduce a dummy stack frame at interrupt, but not
>> at exception.)
>
> Thanks for clear explanation!
>
> Since I'm not the only person to experience the above case, I report one more and
> the last observation ;)
>
> PC and LR information is printed out when function_graph is set.
>
> PC is at sysrq_handle_crash+0x24/0x30
> LR is at sysrq_handle_crash+0x10/0x30
>
> The logs are as follows when function_graph turns off.
>
> PC is at sysrq_handle_crash+0x24/0x30
> LR is at __handle_sysrq+0x128/0x19c
>
> I think __show_regs() might have a similar problem when retrieving LR according to
> the below informative patch. Thoughts?
Well, I believe that it is normal.
Sysrq_handle_crash() is basically a leaf function, but once function or function graph
tracer is turned on, ftrace_caller() is called before its function body and so LR has
been modified when panic/show_regs().
Thanks,
-Takahiro AKASHI
>>>> Signed-off-by: AKASHI Takahiro <takahiro.akashi at linaro.org>
>>>> ---
>>>> arch/arm64/include/asm/ftrace.h | 2 ++
>>>> arch/arm64/include/asm/stacktrace.h | 3 +++
>>>> arch/arm64/kernel/perf_callchain.c | 3 +++
>>>> arch/arm64/kernel/process.c | 3 +++
>>>> arch/arm64/kernel/return_address.c | 3 +++
>>>> arch/arm64/kernel/stacktrace.c | 17 +++++++++++++++++
>>>> arch/arm64/kernel/time.c | 3 +++
>>>> arch/arm64/kernel/traps.c | 3 +++
>>>> 8 files changed, 37 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
>>>> index c5534fa..3c60f37 100644
>>>> --- a/arch/arm64/include/asm/ftrace.h
>>>> +++ b/arch/arm64/include/asm/ftrace.h
>>>> @@ -28,6 +28,8 @@ struct dyn_arch_ftrace {
>>>>
>>>> extern unsigned long ftrace_graph_call;
>>>>
>>>> +extern void return_to_handler(void);
>>>> +
>>>> static inline unsigned long ftrace_call_adjust(unsigned long addr)
>>>> {
>>>> /*
>>>> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
>>>> index 6fb61c5..801a16db 100644
>>>> --- a/arch/arm64/include/asm/stacktrace.h
>>>> +++ b/arch/arm64/include/asm/stacktrace.h
>>>> @@ -22,6 +22,9 @@ struct stackframe {
>>>> unsigned long fp;
>>>> unsigned long sp;
>>>> unsigned long pc;
>>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>>> + unsigned int graph;
>>>> +#endif
>>>> };
>>>
>>> How about using int instead of unsigned int to align with cure_ret_stack
>>> of struct task_struct?
>>>
>>>> extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
>>>> diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
>>>> index 797220d..ff46654 100644
>>>> --- a/arch/arm64/kernel/perf_callchain.c
>>>> +++ b/arch/arm64/kernel/perf_callchain.c
>>>> @@ -164,6 +164,9 @@ void perf_callchain_kernel(struct perf_callchain_entry *entry,
>>>> frame.fp = regs->regs[29];
>>>> frame.sp = regs->sp;
>>>> frame.pc = regs->pc;
>>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>>> + frame.graph = current->curr_ret_stack;
>>>> +#endif
>>>>
>>>> walk_stackframe(current, &frame, callchain_trace, entry);
>>>> }
>>>> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
>>>> index 98bf546..88d742b 100644
>>>> --- a/arch/arm64/kernel/process.c
>>>> +++ b/arch/arm64/kernel/process.c
>>>> @@ -344,6 +344,9 @@ unsigned long get_wchan(struct task_struct *p)
>>>> frame.fp = thread_saved_fp(p);
>>>> frame.sp = thread_saved_sp(p);
>>>> frame.pc = thread_saved_pc(p);
>>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>>> + frame.graph = p->curr_ret_stack;
>>>> +#endif
>>>> stack_page = (unsigned long)task_stack_page(p);
>>>> do {
>>>> if (frame.sp < stack_page ||
>>>> diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c
>>>> index 07b37ac..1718706 100644
>>>> --- a/arch/arm64/kernel/return_address.c
>>>> +++ b/arch/arm64/kernel/return_address.c
>>>> @@ -43,6 +43,9 @@ void *return_address(unsigned int level)
>>>> frame.fp = (unsigned long)__builtin_frame_address(0);
>>>> frame.sp = current_stack_pointer;
>>>> frame.pc = (unsigned long)return_address; /* dummy */
>>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>>> + frame.graph = current->curr_ret_stack;
>>>> +#endif
>>>>
>>>> walk_stackframe(current, &frame, save_return_addr, &data);
>>>>
>>>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>>>> index 9c7acf8..0a39049 100644
>>>> --- a/arch/arm64/kernel/stacktrace.c
>>>> +++ b/arch/arm64/kernel/stacktrace.c
>>>> @@ -17,6 +17,7 @@
>>>> */
>>>> #include <linux/kernel.h>
>>>> #include <linux/export.h>
>>>> +#include <linux/ftrace.h>
>>>> #include <linux/sched.h>
>>>> #include <linux/stacktrace.h>
>>>>
>>>> @@ -66,6 +67,19 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
>>>> frame->fp = *(unsigned long *)(fp);
>>>> frame->pc = *(unsigned long *)(fp + 8);
>>>>
>>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>>> + if (tsk && tsk->ret_stack &&
>>>> + (frame->pc == (unsigned long)return_to_handler)) {
>>>> + /*
>>>> + * This is a case where function graph tracer has
>>>> + * modified a return address (LR) in a stack frame
>>>> + * to hook a function return.
>>>> + * So replace it to an original value.
>>>> + */
>>>> + frame->pc = tsk->ret_stack[frame->graph--].ret;
>>>> + }
>>>> +#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
>>>
>>> There is an index check of ret_stack[] in case of x86 [1]. Even though
>>> graph is unsigned int, I think we need to check the value of frame->graph
>>> before accessing ret_stack[].
>>
>> I'm not sure that the checking is very useful because, if it happens,
>> it is a bug. It might make sense to avoid a possible panic though.
>>
>>>> +
>>>> /*
>>>> * Check whether we are going to walk through from interrupt stack
>>>> * to task stack.
>>>> @@ -137,6 +151,9 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
>>>> frame.sp = current_stack_pointer;
>>>> frame.pc = (unsigned long)save_stack_trace_tsk;
>>>> }
>>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>>> + frame.graph = tsk->curr_ret_stack;
>>>> +#endif
>>>>
>>>> walk_stackframe(tsk, &frame, save_trace, &data);
>>>> if (trace->nr_entries < trace->max_entries)
>>>> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
>>>> index 6e5c521..5977969 100644
>>>> --- a/arch/arm64/kernel/time.c
>>>> +++ b/arch/arm64/kernel/time.c
>>>> @@ -52,6 +52,9 @@ unsigned long profile_pc(struct pt_regs *regs)
>>>> frame.fp = regs->regs[29];
>>>> frame.sp = regs->sp;
>>>> frame.pc = regs->pc;
>>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>>> + frame.graph = -1; /* no task info */
>>>> +#endif
>>>
>>> graph is unsigned int type. Is this intentional?
>>
>> No. This initialization is, I believe, redundant as it is not checked anywhere,
>> but I will re-think of it along with the checking above.
>>
>> Thanks,
>> -Takahiro AKASHI
>>
>>> Best Regards
>>> Jungseok Lee
>>>
>>> [1] arch/x86/kernel/dumpstack.c
>>>
>> ----8<----
>> From 9ebba7167f7838daf68d8231f04141d2f4d4b7b5 Mon Sep 17 00:00:00 2001
>> From: AKASHI Takahiro <takahiro.akashi at linaro.org>
>> Date: Wed, 25 Nov 2015 13:29:54 +0900
>> Subject: [PATCH] fixup! arm64: ftrace: fix a stack tracer's output under
>> function graph tracer
>>
>>
>> Signed-off-by: AKASHI Takahiro <takahiro.akashi at linaro.org>
>> ---
>> arch/arm64/kernel/traps.c | 23 +++++++++++++++++------
>> 1 file changed, 17 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
>> index 46053c2..f140029 100644
>> --- a/arch/arm64/kernel/traps.c
>> +++ b/arch/arm64/kernel/traps.c
>> @@ -147,17 +147,14 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
>> {
>> struct stackframe frame;
>> unsigned long _irq_stack_ptr = per_cpu(irq_stack_ptr, smp_processor_id());
>> + int skip;
>>
>> pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
>>
>> if (!tsk)
>> tsk = current;
>>
>> - if (regs) {
>> - frame.fp = regs->regs[29];
>> - frame.sp = regs->sp;
>> - frame.pc = regs->pc;
>> - } else if (tsk == current) {
>> + if (tsk == current) {
>> frame.fp = (unsigned long)__builtin_frame_address(0);
>> frame.sp = current_stack_pointer;
>> frame.pc = (unsigned long)dump_backtrace;
>> @@ -173,13 +170,27 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
>> frame.graph = tsk->curr_ret_stack;
>> #endif
>>
>> + skip = (regs ? 1 : 0);
>> pr_emerg("Call trace:\n");
>> while (1) {
>> unsigned long where = frame.pc;
>> unsigned long stack;
>> int ret;
>>
>> - dump_backtrace_entry(where);
>> + /* skip until specified stack frame */
>> + if (!skip)
>> + dump_backtrace_entry(where);
>> + else if (frame.fp == regs->regs[29]) {
>> + skip = 0;
>> + /*
>> + * Mostly, this is the case where this function is
>> + * called in panic/abort. As exception handler's
>> + * stack frame does not contain the corresponding pc
>> + * at which an exception has taken place, use regs->pc
>> + * instead.
>> + */
>> + dump_backtrace_entry(regs->pc);
>> + }
>> ret = unwind_frame(tsk, &frame);
>> if (ret < 0)
>> break;
>> --
>> 1.7.9.5
>
> This fixes up the issue I mentioned.
>
> Best Regards
> Jungseok Lee
>
More information about the linux-arm-kernel
mailing list