[RFC PATCH 0/3] arm64: Implement reliable stack trace

Madhavan T. Venkataraman madvenka at linux.microsoft.com
Mon Feb 1 10:21:43 EST 2021



On 1/28/21 9:26 AM, Josh Poimboeuf wrote:
>> If we're trusting the compiler we can probably just do that without any
>> explicit support from the compiler - it should be doing the standard
>> stuff unless we explicitly ask it to and if it isn't then it might be a
>> result of a mismatch in assumptions rather than a deliberate decision to
>> do something non-standard.  My understanding with objtool is that a big
>> part of the idea is to provide a static check that the binary we end up
>> with matches the assumptions that we are making so the fact that it's a
>> separate implementation is important.
> For C code, even if we trusted the compiler (which we don't), we still
> have inline asm which the compiler doesn't have any visibility to, which
> is more than capable of messing up frame pointers (we had several cases
> of this in x86).
> 
> Getting the assembler to annotate which functions are FP could be
> interesting but:
> 
> a) good luck getting the assembler folks to do that; they tend to be
>    insistent on being ignorant of code semantics;
> 
> b) assembly often resembles spaghetti and the concept of a function is
>    quite fluid; in fact many functions aren't annotated as such.

OK. Before this whole discussion, I did not know that the compiler cannot be trusted.
So, it looks like objtool is definitely needed. However, I believe we can minimize
the work objtool does by using a shadow stack.

I read Mark Brown's response to my shadow stack email. I agree with him. The shadow
stack looks promising.

So, here is my suggestion for the shadow stack. This is just to start the discussion
on the shadow stack.

Prolog and epilog for C functions
=================================

Some shadow stack prolog and epilog are needed. Let us add a new option to the compiler
to generate extra no-ops at the beginning of a function for the prolog and just before
return for the epilog so some other entity such as objtool can add its own prolog and
epilog. This is so we don't have to trust the compiler and can maintain our own prolog
and epilog.

Objtool will check for the no-ops. If they are present, it will replace the no-ops with
the shadow stack prolog and epilog. It can also check the frame pointer prolog and
epilog.

Then, it will set a flag in the symbol table entry of the function to indicate that
the function has a proper prolog and epilog.

Prolog and epilog for assembly functions
========================================

The no-ops and frame pointer prolog and epilog can be added to assembly functions manually.
Objtool will process them as above.

Decoding
========

To do all this, objtool has to decode only the following instructions.

        - no-op
        - return instruction
	- store register pair in frame pointer prolog
	- load register pair in frame pointer epilog

This simplifies the objtool part a lot. AFAIK, all instructions in ARM64 are
32 bits wide. So, objtool does not have to decode an instruction to know its
length.

Objtool has to scan a function for the return instruction to know the location(s)
of the epilog.

I guess objtool still has to figure out unreachable code, alternatives and
all that sort of thing. But that logic is already there. Will alternatives
every contain the return instruction? If not, objtool can skip processing
alternatives.

Shadow stack
============

Allocation
----------

Allocate the shadow stack and the regular stack adjacent to each other
or at a fixed distance from each other (may be a guard page in-between?).
This is so the shadow stack can be accessed from any stack address using
a simple calculation.

Top of shadow stack
-------------------

We can either use a compact shadow stack or a parallel shadow stack.
There are trade-offs in each. In either case, we need to know where the
top of the shadow stack is.

We could designate a register for that. But then, we have to make sure that
the register is not used anywhere else. That is a problem.

The alternative is to reserve the first 8 bytes in the shadow stack for the
top of stack pointer.

Prolog
======

Push the previous frame pointer and link register (register that contains
the return address) on the shadow stack.

Epilog
======

Pop the shadow stack.

Scratch registers
=================

A couple of scratch registers have to be used for the prolog and epilog. Perhaps
x16 and x17 can be used? They are supposed to be caller-saved intra procedure call
scratch registers.

Unwinder logic
==============

The unwinder will walk the stack using frame pointers like it does
currently. As it unwinds the regular stack, it will also unwind the
shadow stack:

However, at each step, it needs to perform some additional checks:

        symbol = lookup symbol table entry for pc
        if (!symbol)
                return -EINVAL;

        if (symbol does not have proper prolog and epilog)
                return -EINVAL;

        Compare the information stored on the regular stack with
        that stored on the shadow stack.

        if (the info does not match)
                return -EINVAL;

        Success for this frame


longjmp style situations
========================

Let us say we unwind the regular stack by some number of frames. We need to
unwind the shadow stack as well. After unwinding the regular stack first,
we can just take the current frame pointer of the regular stack and locate
the entry on the shadow stack that matches with that and make that entry the
top of the shadow stack.

Summary
=======

I think a shadow stack will work perfectly for livepatch and will need only
a little help from objtool.

I don't know what the performance will be like though.

I have probably missed some corner cases. Please comment.

Madhavan



More information about the linux-arm-kernel mailing list