[RFC PATCH 0/3] arm64: Implement reliable stack trace

Mark Rutland mark.rutland at arm.com
Thu Oct 15 10:16:12 EDT 2020


On Thu, Oct 15, 2020 at 03:39:37PM +0200, Miroslav Benes wrote:
> Hi,

Hi all,

> On Mon, 12 Oct 2020, Mark Brown wrote:
> 
> > This patch series aims to implement reliable stacktrace for arm64. 
> > Reliable stacktrace exists mainly to support live patching, it provides
> > a version of stacktrace that checks for consistency problems in the
> > traces it generates and provides an error code to callers indicating if
> > any problems were detected.      
> > 
> > This is a first cut of support for arm64, I've not really even started
> > testing it meaningfully at this point.  The main thing I'm looking for
> > here is that I'm not sure if there are any more potential indicators of
> > unrelabile stacks that I'm missing tests for or anything about the
> > interfaces that I've misunderstood.
> 
> I'll just copy an excerpt from my notes about the required guarantees. 
> Written by Josh (CCed, he has better idea about the problem than me 
> anyway).
> 
> "
> The unwinder needs to be able to detect all stack corruption and return
> an error.
> [ But note that we don't need to worry about unwinding a task's stack
> while the task is running, which can be a common source of
> "corruption".  For livepatch we make sure every task is blocked
> (except when checking the current task). ]
> 
> It also needs to:
> - detect preemption / page fault frames and return an error
> - only return success if it reaches the end of the task stack; for user
>   tasks, that means the syscall barrier; for kthreads/idle tasks, that
>   means finding a defined thread entry point
> - make sure it can't get into a recursive loop
> - make sure each return address is a valid text address
> - properly detect generated code hacks like function graph tracing and
>   kretprobes
> "

It would be great if we could put something like the above into the
kernel tree, either under Documentation/ or in a comment somewhere for
the reliable stacktrace functions.

AFAICT, existing architectures don't always handle all of the above in
arch_stack_walk_reliable(). For example, it looks like x86 assumes
unwiding through exceptions is reliable for !CONFIG_FRAME_POINTER, but I
think this might not always be true.

I was planning to send a mail once I've finished writing a test, but
IIUC there are some windows where ftrace/kretprobes detection/repainting
may not work, e.g. if preempted after ftrace_return_to_handler()
decrements curr_ret_stack, but before the arch termpoline asm restores
the original return addr. So we might need something like an
in_return_trampoline() to detect and report that reliably.

Thanks,
Mark.



More information about the linux-arm-kernel mailing list