[RFC PATCH 0/3] arm64: Implement reliable stack trace

Madhavan T. Venkataraman madvenka at linux.microsoft.com
Mon Feb 1 16:38:53 EST 2021

On 2/1/21 10:02 AM, Mark Rutland wrote:
> Hi Madhavan,
> On Mon, Feb 01, 2021 at 09:21:43AM -0600, Madhavan T. Venkataraman wrote:
>> On 1/28/21 9:26 AM, Josh Poimboeuf wrote:
>>>> If we're trusting the compiler we can probably just do that without any
>>>> explicit support from the compiler - it should be doing the standard
>>>> stuff unless we explicitly ask it to and if it isn't then it might be a
>>>> result of a mismatch in assumptions rather than a deliberate decision to
>>>> do something non-standard.  My understanding with objtool is that a big
>>>> part of the idea is to provide a static check that the binary we end up
>>>> with matches the assumptions that we are making so the fact that it's a
>>>> separate implementation is important.
>>> For C code, even if we trusted the compiler (which we don't), we still
>>> have inline asm which the compiler doesn't have any visibility to, which
>>> is more than capable of messing up frame pointers (we had several cases
>>> of this in x86).
>>> Getting the assembler to annotate which functions are FP could be
>>> interesting but:
>>> a) good luck getting the assembler folks to do that; they tend to be
>>>    insistent on being ignorant of code semantics;
>>> b) assembly often resembles spaghetti and the concept of a function is
>>>    quite fluid; in fact many functions aren't annotated as such.
>> OK. Before this whole discussion, I did not know that the compiler cannot be trusted.
> I think "the compiler cannot be trusted" is overly strong. We want to
> *verify* that the compiler is doing what we expect, which might be more
> than what it guarantees to do (and those expectations can change over
> time), but it doesn't mean we should try to avoid the compiler wherever
> possible.

Agreed. My intent was also that if the compiler's implementation is not
performant, we could implement our own, if necessary. For instance, if
the compiler implements a compact stack and we want to implement a
parallel stack in the kernel for performance, the compiler's prolog and
epilog will not work for us.

But if we are happy with what the compiler is providing, I am fine
with it.

> For assembly I expect we'll need to do /some/ manual annotation (e.g.
> for trampolines).
>> So, it looks like objtool is definitely needed. However, I believe we can minimize
>> the work objtool does by using a shadow stack.
>> I read Mark Brown's response to my shadow stack email. I agree with him. The shadow
>> stack looks promising.
>> So, here is my suggestion for the shadow stack. This is just to start the discussion
>> on the shadow stack.
> Regarding unwinding, shadow stacks have the same problems as frame
> records (considering exceptions/interrupts) in that the primary problem
> is knowing *when* they are updated, and knowing *when* the LR is
> valid or invalid or duplicated (in a frame record or shadow stack
> entry).

True. This is a problem for the unwinder in general.

So, I have a few questions from a livepatch perspective.

For livepatch, the kernel makes sure that task is not running when its stack is checked,
correct? The only possibility I can think of is that the task could have received an
interrupt and could have been preempted at the end of the interrupt. The interrupt
could have happened during the frame pointer prolog or epilog. Is this the problem case
for livepatch?

If the unwinder could check a flag in the task that indicates that the task was interrupted,
the unwinder could declare the stack trace unreliable. E.g., a (hacky) solution could
be to set and clear the flag in preempt_schedule_irq() which takes a task off a CPU
when it is preempted at the end of an interrupt. The flag would remain set while the task is not
on a CPU.

Similarly, for exceptions, can we set a flag in a task indicating that it is processing
an exception? Is there a top level exception handler where we can do this? Is there common
code that exception handlers use where we can set this? Or, can we deduce this from ptregs->pstate
that is saved for the task?

Mind you, the flag is advisory. If the unwinder has some way to unwind through an exception,
more power to it.

> Given that, I think that assuming we must use a shadow stack for
> reliable unwinding would be jumping the gun.

So, this is the problem I was considering. Let us say that a function properly sets up the
frame pointer at the beginning and properly restores it to the previous value when it
returns. But because of compiler bugs or some inline assembly code or other errant code,
the frame pointer gets modified in the middle of the function. Then, the function calls
another function. Then, the unwinder tries to unwind the stack. The unwinder has no
way of knowing that the frame pointer was modified. To tackle this problem, Objtool
has to laboriously walk all the code paths and track every modification to the stack and
the frame pointer. And, if there are frame modifications, it has to fail the kernel build.
Did I understand it correctly?

In these cases, the shadow stack can be used to unwind the stack. The shadow stack has
return addresses pushed on it. For livepatch purposes, this good enough.

>> Prolog and epilog for C functions
>> =================================
>> Some shadow stack prolog and epilog are needed. Let us add a new option to the compiler
>> to generate extra no-ops at the beginning of a function for the prolog and just before
>> return for the epilog so some other entity such as objtool can add its own prolog and
>> epilog. This is so we don't have to trust the compiler and can maintain our own prolog
>> and epilog.
> Why wouldn't we ask the compiler to to this, and just check this in the
> tooling?
> ... and if we can do that, why not just check the frame pointer
> manipulation?

I am fine with the compiler doing it and objtool checking it. I was just thinking
that the conventional compact shadow stack is not very performant based on studies.
While it may be OK for apps, it may not be OK for the kernel.

But this is something that we have to actually measure. May be, we can use Clang and
do an experiment to get some measurements.

> Note that functions can have multiple return paths, and there might not
> be one epilogue. Also, today some functions can have special-cased
> prologues for early checks, e.g.
> | function:
> | 	CBNZ	X0, _func
> | 	RET
> | 	STP	X29, X30, [SP, #-FRAME_SIZE]
> | 	MOV	X29, X30
> | 	...
> | 	LDP	X29, X30, [SP], #FRAME_SIZE
> | 	RET
> ... and forcing additional work in those could be detrimental.

I did not say that there would only be one return instruction for a function.
But my bad. I forgot to include the branch instruction as something that also
has to be decoded so objtool can follow all the code paths within a function.
Anyway, whatever needs to be decoded to follow all the paths within a function
must be done.

>> Objtool will check for the no-ops. If they are present, it will replace the no-ops with
>> the shadow stack prolog and epilog. It can also check the frame pointer prolog and
>> epilog.
> I suspect this will interact poorly with patchable-function-entry, which
> prefixes each instrumentable function with some NOPs.

Objtool knows if the kernel was configured with tracing. The compiler inserts a fixed,
known number of no-ops for tracing purposes. So, why is it difficult for objtool to
find the prolog/epilog no-ops?

Mind you, I am already fine with the compiler generating the code.

>> Then, it will set a flag in the symbol table entry of the function to indicate that
>> the function has a proper prolog and epilog.
> I think this boils down to having a prologue and epilogue check, which
> seems sane.

Yes. That is fine.

>> Prolog and epilog for assembly functions
>> ========================================
>> The no-ops and frame pointer prolog and epilog can be added to assembly functions manually.
>> Objtool will process them as above.
>> Decoding
>> ========
>> To do all this, objtool has to decode only the following instructions.
>>         - no-op
>>         - return instruction
>> 	- store register pair in frame pointer prolog
>> 	- load register pair in frame pointer epilog
>> This simplifies the objtool part a lot. AFAIK, all instructions in ARM64 are
>> 32 bits wide. So, objtool does not have to decode an instruction to know its
>> length.
>> Objtool has to scan a function for the return instruction to know the location(s)
>> of the epilog.
> That wouldn't be robust if you consider things like:
> | func:
> | 	STP	x29, x30, [SP, #-FRAME_SIZE]!
> |	MOV	X29, SP
> | 	B	__fake_ret
> |	LDP	X29, X30, [SP], #FRAME_SIZE
> |	RET
> | __fake_ret:
> | 	BL	x30
> ... which is the sort of thing we want objtool to catch.

Objtool has to follow all the code paths to check every single possible return
as I mentioned above.

> [...]
>> Unwinder logic
>> ==============
>> The unwinder will walk the stack using frame pointers like it does
>> currently. As it unwinds the regular stack, it will also unwind the
>> shadow stack:
>> However, at each step, it needs to perform some additional checks:
>>         symbol = lookup symbol table entry for pc
>>         if (!symbol)
>>                 return -EINVAL;
>>         if (symbol does not have proper prolog and epilog)
>>                 return -EINVAL;
> I think at this point, we haven't gained anything from using a shadow
> stack, and I'd much rather we used objtool to gather any metadata needed
> to make unwinding reliable without mandating a shadow stack.

I think we have gained something. Pushing the return addresses on the shadow stack
and using them to unwind means that objtool does not have to decode every single
instruction and track the changes to the stack and frame state. It also means
that the kernel build does not have to be failed when some frame modification is
detected by objtool.


More information about the linux-arm-kernel mailing list