[RFC PATCH 0/3] arm64: Implement reliable stack trace
Mark Rutland
mark.rutland at arm.com
Thu Oct 15 10:16:12 EDT 2020
On Thu, Oct 15, 2020 at 03:39:37PM +0200, Miroslav Benes wrote:
> Hi,
Hi all,
> On Mon, 12 Oct 2020, Mark Brown wrote:
>
> > This patch series aims to implement reliable stacktrace for arm64.
> > Reliable stacktrace exists mainly to support live patching, it provides
> > a version of stacktrace that checks for consistency problems in the
> > traces it generates and provides an error code to callers indicating if
> > any problems were detected.
> >
> > This is a first cut of support for arm64, I've not really even started
> > testing it meaningfully at this point. The main thing I'm looking for
> > here is that I'm not sure if there are any more potential indicators of
> > unrelabile stacks that I'm missing tests for or anything about the
> > interfaces that I've misunderstood.
>
> I'll just copy an excerpt from my notes about the required guarantees.
> Written by Josh (CCed, he has better idea about the problem than me
> anyway).
>
> "
> The unwinder needs to be able to detect all stack corruption and return
> an error.
> [ But note that we don't need to worry about unwinding a task's stack
> while the task is running, which can be a common source of
> "corruption". For livepatch we make sure every task is blocked
> (except when checking the current task). ]
>
> It also needs to:
> - detect preemption / page fault frames and return an error
> - only return success if it reaches the end of the task stack; for user
> tasks, that means the syscall barrier; for kthreads/idle tasks, that
> means finding a defined thread entry point
> - make sure it can't get into a recursive loop
> - make sure each return address is a valid text address
> - properly detect generated code hacks like function graph tracing and
> kretprobes
> "
It would be great if we could put something like the above into the
kernel tree, either under Documentation/ or in a comment somewhere for
the reliable stacktrace functions.
AFAICT, existing architectures don't always handle all of the above in
arch_stack_walk_reliable(). For example, it looks like x86 assumes
unwiding through exceptions is reliable for !CONFIG_FRAME_POINTER, but I
think this might not always be true.
I was planning to send a mail once I've finished writing a test, but
IIUC there are some windows where ftrace/kretprobes detection/repainting
may not work, e.g. if preempted after ftrace_return_to_handler()
decrements curr_ret_stack, but before the arch termpoline asm restores
the original return addr. So we might need something like an
in_return_trampoline() to detect and report that reliably.
Thanks,
Mark.
More information about the linux-arm-kernel
mailing list