[RFC PATCH 0/3] arm64: Implement reliable stack trace

Mark Rutland mark.rutland at arm.com
Wed Jan 27 11:40:56 EST 2021


On Wed, Jan 27, 2021 at 08:02:41AM -0600, Madhavan T. Venkataraman wrote:
> 
> 
> On 10/12/20 12:26 PM, Mark Brown wrote:
> > This patch series aims to implement reliable stacktrace for arm64. 
> > Reliable stacktrace exists mainly to support live patching, it provides
> > a version of stacktrace that checks for consistency problems in the
> > traces it generates and provides an error code to callers indicating if
> > any problems were detected.      
> > 
> > This is a first cut of support for arm64, I've not really even started
> > testing it meaningfully at this point.  The main thing I'm looking for
> > here is that I'm not sure if there are any more potential indicators of
> > unrelabile stacks that I'm missing tests for or anything about the
> > interfaces that I've misunderstood.
> > 
> > There's more work that can be done here, mainly that we could sync our
> > unwinder more with what's done on S/390 and x86 which should if nothing
> > else help with keeping up to date with generic changes, but this should 
> > be what's needed to allow reliable stack trace.
> > 
> > Mark Brown (2):
> >   arm64: stacktrace: Report when we reach the end of the stack
> >   arm64: stacktrace: Implement reliable stacktrace
> > 
> > Mark Rutland (1):
> >   arm64: remove EL0 exception frame record
> > 
> >  arch/arm64/Kconfig             |  1 +
> >  arch/arm64/kernel/entry.S      | 10 +++----
> >  arch/arm64/kernel/stacktrace.c | 55 ++++++++++++++++++++++++++++------
> >  3 files changed, 52 insertions(+), 14 deletions(-)
> > 
> 
> This is mostly a question to improve my understanding of the current ARM64
> unwinder.
> 
> Currently, ARM64 defines different stack types - task stack, IRQ stack, etc.
> When it unwinds, it appears to unwind only the currently active stack.

The current (unreliable) unwinder will unwind across stack changes. That
detects stack transiations and will happily unwind across multiple
stacks so long as these do not loop.

However, where a backtrace crosses an exception boundary, there are
cases where this could in theory omit an entry from the backtrace
because. The LR and FP are only guaranteed to be in a consistent state
at function call boudaries, and since exceptions can be taken in the
middle of functions (or trampolines which transiently place these in an
inconsistent state) we cannot reliably backtrace across exception
boundaries (which may or may not involve a change of stack), unless we
had additional metadata and/or guarantees from compilers on how these
are manipulated.

Where we change stack without an exception boundary, we can reliably
unwind.

> Specifically, if an interrupt has happened and the IRQ stack is the one that
> is active, only the IRQ stack is unwound. The task stack is not. Is this
> accurate?

The existing (unreliable) unwinder will unwind this case. The last frame
record on the IRQ stack will point to a frame record on the task stack,
and the unwinder will determine this can be safely accessed via the
on_accessible_stack() check. It will subsequently reject any frame
records on the IRQ stack (i.e. loops).

> My question is - for live patching, we would need to look at the task stack
> as well, right?

Ideally, we would be able to do this, but currently we cannot safely do
so. IIUC this means that live patching is still possible, but is
potentially much slower to apply updates.

> May be, we need to pass a flag to the unwinder to check the
> task stack in addition to the active task?

The logic to unwind across stack and exception boundaries already
exists, but to make this reliable we will need more invasive work,
potentially changing trampolines and/or adding metadata for these,
perhaps requiring objtool and/or toolchain changes.

Thanks,
Mark.



More information about the linux-arm-kernel mailing list