[PATCH v5 8/8] unwind: arm64: Use sframe to unwind interrupt frames
Mark Rutland
mark.rutland at arm.com
Tue May 12 03:07:59 PDT 2026
On Mon, May 11, 2026 at 08:00:21PM -0700, Dylan Hatch wrote:
> Hi Mark,
>
> Thanks for all the feedback and help on this. I'm planning on getting
> your comments addressed in the coming days, but I have some initial
> clarifying questions.
>
> On Fri, May 1, 2026 at 9:46 AM Mark Rutland <mark.rutland at arm.com> wrote:
> >
> > Hi Dylan,
> >
> > Thanks for putting this together. I think this is looking pretty good.
> > However, there are some things that aren't quite right and need some
> > work, which I've commented on below.
> >
> > More generally, there are a few things that aren't addressed by this
> > series that we will also need to address. Importantly:
> >
> > (1) For correctness, we'll need to address a latent issue with unwinding
> > across an fgraph return trampoline, where the return address is
> > transiently unrecoverable.
> >
> > Before this series, that doesn't matter for livepatching because the
> > livepatching code isn't called synchronously within the fgraph
> > handler, and unwinds which cross an exception boundary are marked as
> > unreliable.
> >
> > After this series, that does matter as we can unwind across an
> > exception boundary, and might happen to interrupt that transient
> > window.
> >
> > I think we can solve that with some restructuring of that code,
> > restoring the original address *before* removing that from the
> > fgraph return stack, and ensuring that the unwinder can find it.
>
> If my understanding is correct, the issue arrises in return_to_handler
> as the return address is recovered:
>
> mov x0, sp
> bl ftrace_return_to_handler // addr = ftrace_return_to_hander(fregs);
> mov x30, x0 // restore the original return address
>
> Because ftrace_return_to_handler pops the return address from the
> return stack before it can be restored into the LR, it cannot be
> recovered.
Yes.
To be clear, please don't worry about solving that for the next version
of this series; let's get the SFrame bits into shape first. I just
wanted to highlight that there's some more general work that we'll need
to do.
I think we can *detect* this case (and mark the unwind as unreliable)
with some tiny changes to the arm64 code. I'm happy to put that
together.
> Based on this, I believe you are suggesting to restructure this code
> path such that the return address is removed from the return stack
> only after it has been restored to LR. But since kernel/trace/fgraph.c
> is core kernel code, will this end up requiring either (1) a similar
> restructuring of other arches supporting ftrace, or (2) an
> arm64-specific implementation of this recovery logic?
Yes, I am say that to *recover* the address we'd need to make changes to
core code.
In the mean time we can *detect* this case with some minimal changes to
arm64 code, and abort. As above, I'm happy to go put that together.
> It looks to me like there is essentially the same recovery pattern on
> other arches; is there a reason this transient unrecoverability isn't
> an issue for reliable unwind on other platforms?
Yep; on all architectures there's a transient period where the address
cannot be recovered. It's not a correctness issue so long as the
architecture detects this case and marks the unwind as unreliable.
IIUC x86 will mark the unwind as unreliable in this case.
I don't know whether other architectures detect this reliably. That's a
question for loonarch, parisc, powerpc, and s390 folk.
> > I'm not immediately sure whether kretprobes has a similar issue.
> >
> > (2) To make unwinding generally possible, we'll need to annotate some
> > assembly functions as unwindable. We'll need to do that for string
> > routines under lib/, and probably some crypto code, but we don't
> > need to do that for most code in head.S, entry.S, etc.
> >
> > The vast majority of relevant assembly functions are leaf functions
> > (where the return address is never moved out of the LR), so we can
> > probably get away with a simple annotation for those that avoids the
> > need for open-coded CFI directives everywhere.
>
> Are you suggesting something like a SYM_LEAF_FUNC_(START|END), that
> wraps CFI directives for leaf functions?
Yep; that's exactly the sort of thing I was thinking of.
That or have a seaprate annotation we can add, e.g.
SYM_FUNC_START(foo)
SYM_FUN_END(foo)
SYM_FUNC_IS_LEAF_AND_DOES_NOT_TOUCH_LR(foo)
> > I've pushed some reliable stacktrace tests to:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git stacktrace/tests
> >
> > That finds the fgraph issue (regardless of this series). When merged
> > with this series triggers a warning in kunwind_next_frame_record_meta(),
> > where unwind_next_frame_sframe() calls that erroneously as a fallback.
>
> Thanks for the pointer on these tests, they're super useful! I've been
> able to reproduce the fgraph failure you mentioned.
Great!
Mark.
More information about the linux-arm-kernel
mailing list