ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

Ard Biesheuvel ardb at kernel.org
Sat Dec 2 01:26:54 PST 2023


On Sat, 2 Dec 2023 at 09:49, Justin Chen <justin.chen at broadcom.com> wrote:
>
>
>
> On 12/1/23 10:53 PM, Ard Biesheuvel wrote:
> > On Fri, 1 Dec 2023 at 23:59, Justin Chen <justin.chen at broadcom.com> wrote:
> >>
> >>
> >>
> >> On 12/1/23 10:07 AM, Steven Rostedt wrote:
> >>> On Fri, 1 Dec 2023 09:25:59 -0800
> >>> Justin Chen <justin.chen at broadcom.com> wrote:
> >>>
> >>>>> It appears the sub instruction at 0x6dd0 correctly accounts for the
> >>>>> extra 8 bytes, so the frame pointer is valid. So it is our assumption
> >>>>> that there are no gaps between the stack frames is invalid.
> >>>>
> >>>> Thanks for the assistance. The gap between the stack frame depends on
> >>>> the function. Most do not have a gap. Some have 8 (as shown above), some
> >>>> have 12. A single assumption here is not going to work. I'm having a
> >>>> hard time finding out the reasoning for this gap. I tried disabling a
> >>>> bunch of gcc flags as well as -O2 and the gap still exists.
> >>>
> >>> That code was originally added because of some strange things that gcc did
> >>> with mcount (for example, it made a copy of the stack frame that it passed
> >>> to mcount, where the function graph tracer replaced the copy of the return
> >>> stack making the shadow stack go out of sync and crash). This was very hard
> >>> to debug and I added this code to detect it if it happened again.
> >>>
> >>> Well it's been over a decade since that happened (2009).
> >>>
> >>>     71e308a239c09 ("function-graph: add stack frame test")
> >>>
> >>> I'm happy assuming that the compiler folks are aware of our tricks with
> >>> hijacking return calls and I don't expect it to happen again. We can just
> >>> rip out those checks. That is, if it's only causing false positives, I
> >>> don't think it's worth keeping around.
> >>>
> >>> Has it detected any real issues on the Arm platforms?
> >>>
> >>> -- Steve
> >>
> >> I am not familiar enough to make a call. But from my limited testing
> >> with ARM, I didn't see any issues. If you would like me to, I can submit
> >> a patch to remove the check entirely. Or maybe only disable it for ARM?
> >>
> >
> > Please try the fix I proposed first.
>
> Just tested it. Seems to do the trick.

Thanks

> Either solution works for me.
>

Given that this discussion is taking place in the context of the
report of an issue identified by GRAPH_FP_TEST, I don't see how
removing that would be a reasonable conclusion.

> FWIW I also experimented with LLVM, looks like function_graph just
> crashes regardless of the issue being discussed. The disassemble of
> LLVM[1] does something completely different.
>


LLVM does not support CONFIG_UNWINDER_FRAME_POINTER so the fact that
the prologue looks different is expected.

In the case below, the FP {r11} is correctly made to point to a {r11,
lr} tuple on the stack, so the codegen is correct AFAICT. But IIRC we
rely on unwind information for ftrace in this case, not the frame
pointer.

Could you be more specific about the crash?


>
> [1]
> LLVM dump
> c0c6faa0 <sk_getsockopt>:
> c0c6faa0: f0 4f 2d e9   push    {r4, r5, r6, r7, r8, r9, r10, r11, lr}
> c0c6faa4: 1c b0 8d e2   add     r11, sp, #28
> c0c6faa8: ac d0 4d e2   sub     sp, sp, #172
> c0c6faac: 00 70 a0 e1   mov     r7, r0
> c0c6fab0: c8 0c 04 e3   movw    r0, #19656
> c0c6fab4: 80 02 4c e3   movt    r0, #49792
> c0c6fab8: 03 50 a0 e1   mov     r5, r3
> c0c6fabc: 00 00 90 e5   ldr     r0, [r0]
> c0c6fac0: 02 a0 a0 e1   mov     r10, r2
> c0c6fac4: 20 00 0b e5   str     r0, [r11, #-32]
> c0c6fac8: 00 40 2d e9   stmdb   sp!, {lr}
> c0c6facc: 4b 8b d6 eb   bl      0xc0212800 <__gnu_mcount_nc> @ imm =
> #-10867412



More information about the linux-arm-kernel mailing list