[RFC PATCH 0/3] arm64: Implement reliable stack trace
Madhavan T. Venkataraman
madvenka at linux.microsoft.com
Wed Feb 3 14:03:02 EST 2021
On 2/3/21 10:53 AM, Mark Rutland wrote:
> On Tue, Feb 02, 2021 at 05:32:32PM -0600, Madhavan T. Venkataraman wrote:
>> On 2/2/21 4:05 AM, Mark Rutland wrote:
>>> I think that practically speaking it's necessary to track all potential
>>> paths through functions that may alter the shadow stack or the shadow
>>> stack pointer to ensure that the manipulation is well-balanced and that
>>> the shadow stack pointer isn't corrupted.
>>> Practically speaking, this requires decoding a significant number of
>>> instructions, and tracing through all potential paths a function may
>> I thought about it some more since you don't like the shadow stack
>> that much.
> Just to be clear, what I was trying to get across was:
> * Whatever we do, we want to verify at compile time that the kernel
> binary matches our expecations, and practically speaking this almost
> certainly means using objtool.
> * The analysis that objtool will have to do is not made significantly
> simpler through the use of a shadow stack, as it still needs to track
> all paths though a function, etc.
Actually, traversing all the paths within a function is not the tough part, IMHO.
A subset of the instructions must be decoded. I think what is tough is decoding
every single instruction that can potentially use the stack and the frame pointer
registers, tracking the stack and frame state through all of that accurately and
finding violations - that is the real work.
I will study what Julien has implemented. If he has already done most of the work,
then this whole discussion is moot.
> I'm not averse to using a shadow stack (and clang's SCS is a useful
> security feature), I just don't think that it helps much with
> compile-time verification, and I don't see a strong reason to mandate it
> for livepatching.
Actually, the security feature of shadow stacks is probably not useful for
the kernel. IIUC, the security feature is that the overwriting of the return
address through buffer overflow attacks can be avoided if there is a shadow
stack. That is not relevant to the kernel.
I still feel that making the prolog and epilog smarter can save a significant
amount of objtool work.
>> The goal is - even if a function modifies fp and/or does not restore sp to its
>> correct value at the end, the prolog and epilog should manage it so that everything
>> works. To do this, the current frame pointer address is stored in fp as well as cur_fp.
>> Even if fp is modified, cur_fp will still point to the correct frame address.
>> The original prolog is:
>> - Push fp and return address on the stack
>> - fp = sp
>> The new prolog is:
>> - Save cur_fp on the stack
>> - Push fp, return address on the stack
>> - fp = sp
>> - cur_fp = fp
>> The original epilog is:
>> - Pop fp and return address
>> The new epilog is:
>> - sp = cur_fp
>> - Pop fp and return address
>> - Restore cur_fp from the stack
>> I think this is pretty simple.
> I'm afraid I don't understand the problem you're trying to solve here.
> The epilog you propose is also unsound in the face of asynchronous
> exceptions, so I suspect you haven't thought as hard about this as you
> need to.
Asynchronous exceptions are a problem even with the existing frame
pointer prolog and epilog. What is the extra problem that the new
prolog and epilog introduce? The only additional thing I am
introducing is the saving of the fp to a memory location and
restoring it. I am not sure I see how that can be a problem. But
if it is a problem, I would like to understand it. Can you elaborate?
> Even if the compiler uses a different prologue/epilogue sequence, we
> still need to verify that the rest of the function does nothing to
> undermine that.
What can the function do to undermine that? The epilog already
handles the clobbering of the fp. It handles the case where a
function has pushed something on to the stack and has not
The only other thing I can think of is a function clobbering the
cur_fp location itself. For that matter, a function can clobber
any location on the stack. Objtool will not be able to detect that.
But it is possible I have missed something. Can you elaborate?
> I think this is merely different rather than simpler, and regardless
> this would be an invasive change to compilers.
It is a simpler change as compared to a shadow stack.
I believe all toolchain changes that have been done to specifically support
Linux kernel features have been invasive. Have they not?
>> The unwinder will start the stack walk from cur_fp instead of fp. At each frame,
>> it will use the saved cur_fp instead of the saved fp.
>> Also, at each step, it can know if fp was actually changed by the function in
>> the frame. The unwinder can optionally issue warnings.
> So this is just about aditional robustness?
> I'm happy to use a shadow stack for /additional/ robustness, I just
> don't think a shadow stack alone solves all the other issues that we
> need compile time verification for, nor does it solve cases where we
> might want metadata generated at compile time.
I am only discussing reliable stack trace and objtool's part in it. I am cool
with all the other issues objtool tackles.
>> Compiler issue
>> This solution is geared towards the kernel only. It assumes that the stack
>> has a fixed size and alignment so the bottom of the stack can be reached
>> from the current sp.
>> So, the compiler has to support two prologs and epilogs, one pair for apps
>> and one pair for the kernel.
>> Since this is just a tiny bit of code, I don't think this is a problem.
> I suspect it's more compilated than that. Configuration differences like
> this can easily double the necessary testing, and are liable to becomer
> a maintenance burden, so I don't expect compiler folk are likely to want
> to support this unless necessary.
I think this is pretty much true of any compiler change you request for the
> At the moment, I don't think that this solves a real problem, and I
> don't think that we need to change this.
I do think it solves the problem of making the stack frame immune to
the function clobbering the frame pointer. That is totally relevant to
reliable stack trace.
Anyway, I think I will move on to other things (unless someone has an
interest this topic).
More information about the linux-arm-kernel