[PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends

Thu Mar 4 15:30:34 GMT 2021

On Thu, 4 Mar 2021 at 15:57, Mark Rutland <mark.rutland at arm.com> wrote:
> [adding Mark Brown]
>
> On Wed, Mar 03, 2021 at 04:20:43PM +0100, Marco Elver wrote:
> > On Wed, Mar 03, 2021 at 03:52PM +0100, Christophe Leroy wrote:
> > > Le 03/03/2021 ï¿½ 15:38, Marco Elver a ï¿½critï¿½:
> > > > On Wed, 3 Mar 2021 at 15:09, Christophe Leroy
> > > > <christophe.leroy at csgroup.eu> wrote:
> > > > >
> > > > > It seems like all other sane architectures, namely x86 and arm64
> > > > > at least, include the running function as top entry when saving
> > > > > stack trace.
> > > > >
> > > > > Functionnalities like KFENCE expect it.
> > > > >
> > > > > Do the same on powerpc, it allows KFENCE to properly identify the faulting
> > > > > function as depicted below. Before the patch KFENCE was identifying
> > > > > finish_task_switch.isra as the faulting function.
> > > > >
> > > > > [   14.937370] ==================================================================
> > > > > [   14.948692] BUG: KFENCE: invalid read in test_invalid_access+0x54/0x108
> > > > > [   14.948692]
> > > > > [   14.956814] Invalid read at 0xdf98800a:
> > > > > [   14.960664]  test_invalid_access+0x54/0x108
> > > > > [   14.964876]  finish_task_switch.isra.0+0x54/0x23c
> > > > > [   14.969606]  kunit_try_run_case+0x5c/0xd0
> > > > > [   14.973658]  kunit_generic_run_threadfn_adapter+0x24/0x30
> > > > > [   14.979079]  kthread+0x15c/0x174
> > > > > [   14.982342]  ret_from_kernel_thread+0x14/0x1c
> > > > > [   14.986731]
> > > > > [   14.988236] CPU: 0 PID: 111 Comm: kunit_try_catch Tainted: G    B             5.12.0-rc1-01537-g95f6e2088d7e-dirty #4682
> > > > > [   14.999795] NIP:  c016ec2c LR: c02f517c CTR: c016ebd8
> > > > > [   15.004851] REGS: e2449d90 TRAP: 0301   Tainted: G    B              (5.12.0-rc1-01537-g95f6e2088d7e-dirty)
> > > > > [   15.015274] MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 22000004  XER: 00000000
> > > > > [   15.022043] DAR: df98800a DSISR: 20000000
> > > > > [   15.022043] GPR00: c02f517c e2449e50 c1142080 e100dd24 c084b13c 00000008 c084b32b c016ebd8
> > > > > [   15.022043] GPR08: c0850000 df988000 c0d10000 e2449eb0 22000288
> > > > > [   15.040581] NIP [c016ec2c] test_invalid_access+0x54/0x108
> > > > > [   15.046010] LR [c02f517c] kunit_try_run_case+0x5c/0xd0
> > > > > [   15.051181] Call Trace:
> > > > > [   15.053637] [e2449e50] [c005a68c] finish_task_switch.isra.0+0x54/0x23c (unreliable)
> > > > > [   15.061338] [e2449eb0] [c02f517c] kunit_try_run_case+0x5c/0xd0
> > > > > [   15.067215] [e2449ed0] [c02f648c] kunit_generic_run_threadfn_adapter+0x24/0x30
> > > > > [   15.074472] [e2449ef0] [c004e7b0] kthread+0x15c/0x174
> > > > > [   15.079571] [e2449f30] [c001317c] ret_from_kernel_thread+0x14/0x1c
> > > > > [   15.085798] Instruction dump:
> > > > > [   15.088784] 8129d608 38e7ebd8 81020280 911f004c 39000000 995f0024 907f0028 90ff001c
> > > > > [   15.096613] 3949000a 915f0020 3d40c0d1 3d00c085 <8929000a> 3908adb0 812a4b98 3d40c02f
> > > > > [   15.104612] ==================================================================
> > > > >
> > > > > Signed-off-by: Christophe Leroy <christophe.leroy at csgroup.eu>
> > > >
> > > > Acked-by: Marco Elver <elver at google.com>
> > > >
> > > > Thank you, I think this looks like the right solution. Just a question below:
> > > >
> > > ...
> > >
> > > > > @@ -59,23 +70,26 @@ void save_stack_trace(struct stack_trace *trace)
> > > > >
> > > > >          sp = current_stack_frame();
> > > > >
> > > > > -       save_context_stack(trace, sp, current, 1);
> > > > > +       save_context_stack(trace, sp, (unsigned long)save_stack_trace, current, 1);
> > > >
> > > > This causes ip == save_stack_trace and also below for
> > > > save_stack_trace_tsk. Does this mean save_stack_trace() is included in
> > > > the trace? Looking at kernel/stacktrace.c, I think the library wants
> > > > to exclude itself from the trace, as it does '.skip = skipnr + 1' (and
> > > > '.skip   = skipnr + (current == tsk)' for the _tsk variant).
> > > >
> > > > If the arch-helper here is included, should this use _RET_IP_ instead?
> > > >
> > >
> > > Don't really know, I was inspired by arm64 which has:
> > >
> > > void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
> > >                  struct task_struct *task, struct pt_regs *regs)
> > > {
> > >     struct stackframe frame;
> > >
> > >     if (regs)
> > >             start_backtrace(&frame, regs->regs[29], regs->pc);
> > >     else if (task == current)
> > >             start_backtrace(&frame,
> > >                             (unsigned long)__builtin_frame_address(0),
> > >                             (unsigned long)arch_stack_walk);
> > >     else
> > >             start_backtrace(&frame, thread_saved_fp(task),
> > >                             thread_saved_pc(task));
> > >
> > >     walk_stackframe(task, &frame, consume_entry, cookie);
> > > }
> > >
> > > But looking at x86 you may be right, so what should be done really ?
> >
> > x86:
> >
> > [    2.843292] calling stack_trace_save:
> > [    2.843705]  test_func+0x6c/0x118
> > [    2.844184]  do_one_initcall+0x58/0x270
> > [    2.844618]  kernel_init_freeable+0x1da/0x23a
> > [    2.845110]  kernel_init+0xc/0x166
> > [    2.845494]  ret_from_fork+0x22/0x30
> >
> > [    2.867525] calling stack_trace_save_tsk:
> > [    2.868017]  test_func+0xa9/0x118
> > [    2.868530]  do_one_initcall+0x58/0x270
> > [    2.869003]  kernel_init_freeable+0x1da/0x23a
> > [    2.869535]  kernel_init+0xc/0x166
> > [    2.869957]  ret_from_fork+0x22/0x30
> >
> > arm64:
> >
> > [    3.786911] calling stack_trace_save:
> > [    3.787147]  stack_trace_save+0x50/0x78
> > [    3.787443]  test_func+0x84/0x13c
> > [    3.787738]  do_one_initcall+0x5c/0x310
> > [    3.788099]  kernel_init_freeable+0x214/0x294
> > [    3.788363]  kernel_init+0x18/0x164
> > [    3.788585]  ret_from_fork+0x10/0x30
> >
> > [    3.803615] calling stack_trace_save_tsk:
> > [    3.804266]  stack_trace_save_tsk+0x9c/0x100
> > [    3.804541]  test_func+0xc4/0x13c
> > [    3.804803]  do_one_initcall+0x5c/0x310
> > [    3.805031]  kernel_init_freeable+0x214/0x294
> > [    3.805284]  kernel_init+0x18/0x164
> > [    3.805505]  ret_from_fork+0x10/0x30
> >
> > +Cc arm64 folks.
> >
> > So I think the arm64 version also has a bug, because I think a user of
> > <linux/stacktrace.h> really doesn't care about the library function
> > itself. And from reading kernel/stacktrace.c I think it wants to exclude
> > itself entirely.
> >
> > It's a shame that <linux/stacktrace.h> isn't better documented, but I'm
> > pretty sure that including the library functions in the trace is not
> > useful.
>
> I agree this behaviour isn't desireable, and that the lack of
> documentation is unfortunate.
>
> It looks like GCC is happy to give us the function-entry-time FP if we use
> __builtin_frame_address(1), and assuming clang is similarly happy we can do:
>
> | diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> | index ad20981dfda4..5dfbf915eb7f 100644
> | --- a/arch/arm64/kernel/stacktrace.c
> | +++ b/arch/arm64/kernel/stacktrace.c
> | @@ -203,8 +203,8 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
> |                 start_backtrace(&frame, regs->regs[29], regs->pc);
> |         else if (task == current)
> |                 start_backtrace(&frame,
> | -                               (unsigned long)__builtin_frame_address(0),
> | -                               (unsigned long)arch_stack_walk);
> | +                               (unsigned long)__builtin_frame_address(1),
> | +                               (unsigned long)__builtin_return_address(0));
> |         else
> |                 start_backtrace(&frame, thread_saved_fp(task),
> |                                 thread_saved_pc(task));
>
> ... such that arch_stack_walk() will try to avoid including itself in a
> trace, and so the existing skipping should (w/ caveats below) skip
> stack_trace_save() or stack_trace_save_tsk().

Thank you! Yes, that works.

> If that works for you, I can spin that as a patch, though we'll need to
> check that doesn't introduce a new fencepost error elsewhere.
>
> The bigger problem here is that skipping is dodgy to begin with, and
> this is still liable to break in some cases. One big concern is that
> (especially with LTO) we cannot guarantee the compiler will not inline
> or outline functions, causing the skipp value to be too large or too
> small. That's liable to happen to callers, and in theory (though
> unlikely in practice), portions of arch_stack_walk() or
> stack_trace_save() could get outlined too.
>
> Unless we can get some strong guarantees from compiler folk such that we
> can guarantee a specific function acts boundary for unwinding (and
> doesn't itself get split, etc), the only reliable way I can think to
> solve this requires an assembly trampoline. Whatever we do is liable to
> need some invasive rework.

Will LTO and friends respect 'noinline'? One thing I also noticed is
that tail calls would also cause the stack trace to appear somewhat
incomplete (for some of my tests I've disabled tail call
optimizations). Is there a way to also mark a function
non-tail-callable? But I'm also not sure if with all that we'd be
guaranteed the code we want, even though in practice it might.

Thanks,
-- Marco