[RFC PATCH v1 0/4] arm64: Implement stack trace reliability checks

Sun Apr 4 04:29:12 BST 2021

On 4/3/21 12:01 PM, Josh Poimboeuf wrote:
> On Tue, Mar 30, 2021 at 02:09:51PM -0500, madvenka at linux.microsoft.com wrote:
>> From: "Madhavan T. Venkataraman" <madvenka at linux.microsoft.com>
>>
>> There are a number of places in kernel code where the stack trace is not
>> reliable. Enhance the unwinder to check for those cases and mark the
>> stack trace as unreliable. Once all of the checks are in place, the unwinder
>> can be used for livepatching.
> 
> This assumes all such places are known.  That's a big assumption, as
> 
> a) hand-written asm code may inadvertantly skip frame pointer setup;
> 
> b) for inline asm which calls a function, the compiler may blindly
>    insert it into a function before the frame pointer setup.
> 
> That's where objtool stack validation would come in.
>    

Yes. I meant that the reliable stack trace in the kernel is necessary. I did
not imply that it was sufficient. Clearly, it is not sufficient. It relies
on the frame pointer being setup correctly for all functions. That has to be
guaranteed by another entity such as objtool.

So, I will improve the wording and make it clear in the next version.

>> Detect EL1 exception frame
>> ==========================
>>
>> EL1 exceptions can happen on any instruction including instructions in
>> the frame pointer prolog or epilog. Depending on where exactly they happen,
>> they could render the stack trace unreliable.
>>
>> Add all of the EL1 exception handlers to special_functions[].
>>
>> 	- el1_sync()
>> 	- el1_irq()
>> 	- el1_error()
>> 	- el1_sync_invalid()
>> 	- el1_irq_invalid()
>> 	- el1_fiq_invalid()
>> 	- el1_error_invalid()
> 
> A possibly more robust alternative would be to somehow mark el1
> exception frames so the unwinder can detect them more generally.
> 
> For example, as described in my previous email, encode the frame pointer
> so the unwinder can detect el1 frames automatically.
> 

Encoding the frame pointer by setting the LSB (like X86) was my first solution.
Mark Rutland NAKed it. His objection was that it would confuse the debuggers
which are expecting the last 4 bits of the frame pointer to be 0. I agree with
this objection.

My problem with the encoding was also that it is not possible to know if the
LSB was set because of encoding or because of stack corruption.

My second attempt was to encode the frame pointer indirectly. That is, make
pt_regs->stackframe the exception frame and use other fields in the pt_regs
(including a frame type encoding field) for verification.

Mark Rutland NAKed it. His objection (if I am rephrasing it correctly) was that
garbage on the stack may accidentally match the values the unwinder checks in
the pt_regs (however unlikely that match might be).

The consensus was that the return PC must be checked against special functions
to recognize those special cases as the special functions are only invoked in
those special contexts and nowhere else.

As an aside, Mark Brown suggested (if I recall correctly) that the exception
functions could be placed in a special exception section so the unwinder can
check a return PC against the section bounds instead of individual functions.
I did consider implementing this. But I needed a way to address FTRACE
trampolines and KPROBE trampolines as well. So, I did not do that.

>> Detect ftrace frame
>> ===================
>>
>> When FTRACE executes at the beginning of a traced function, it creates two
>> frames and calls the tracer function:
>>
>> 	- One frame for the traced function
>>
>> 	- One frame for the caller of the traced function
>>
>> That gives a sensible stack trace while executing in the tracer function.
>> When FTRACE returns to the traced function, the frames are popped and
>> everything is back to normal.
>>
>> However, in cases like live patch, the tracer function redirects execution
>> to a different function. When FTRACE returns, control will go to that target
>> function. A stack trace taken in the tracer function will not show the target
>> function. The target function is the real function that we want to track.
>> So, the stack trace is unreliable.
> 
> I don't think this is a real problem.  Livepatch only checks the stacks
> of blocked tasks (and the task calling into livepatch).  So the
> reliability of unwinding from the livepatch tracer function itself
> (klp_ftrace_handler) isn't a concern since it doesn't sleep.
> 

My thinking was - arch_stack_walk_reliable() should provide a reliable stack trace
and not assume anything about its consumers. It should not assume that livepatch is
the only consumer although it might be. 

Theoretically, there can be a tracer function that calls some kernel function F() that
can go to sleep. Is this not allowed?

Or, F() could call arch_stack_walk_reliable() on the current task for debugging
or tracing purposes. It should still work correctly.

>> To detect FTRACE in a stack trace, add the following to special_functions[]:
>>
>> 	- ftrace_graph_call()
>> 	- ftrace_graph_caller()
>>
>> Please see the diff for a comment that explains why ftrace_graph_call()
>> must be checked.
>>
>> Also, the Function Graph Tracer modifies the return address of a traced
>> function to a return trampoline (return_to_handler()) to gather tracing
>> data on function return. Stack traces taken from the traced function and
>> functions it calls will not show the original caller of the traced function.
>> The unwinder handles this case by getting the original caller from FTRACE.
>>
>> However, stack traces taken from the trampoline itself and functions it calls
>> are unreliable as the original return address may not be available in
>> that context. This is because the trampoline calls FTRACE to gather trace
>> data as well as to obtain the actual return address and FTRACE discards the
>> record of the original return address along the way.
> 
> Again, this shouldn't be a concern because livepatch won't be unwinding
> from a function_graph trampoline unless it got preempted somehow (and
> then the el1 frame would get detected anyway).
> 

Please see previous answer.

>> Add return_to_handler() to special_functions[].
>>
>> Check for kretprobe
>> ===================
>>
>> For functions with a kretprobe set up, probe code executes on entry
>> to the function and replaces the return address in the stack frame with a
>> kretprobe trampoline. Whenever the function returns, control is
>> transferred to the trampoline. The trampoline eventually returns to the
>> original return address.
>>
>> A stack trace taken while executing in the function (or in functions that
>> get called from the function) will not show the original return address.
>> Similarly, a stack trace taken while executing in the trampoline itself
>> (and functions that get called from the trampoline) will not show the
>> original return address. This means that the caller of the probed function
>> will not show. This makes the stack trace unreliable.
>>
>> Add the kretprobe trampoline to special_functions[].
>>
>> FYI, each task contains a task->kretprobe_instances list that can
>> theoretically be consulted to find the orginal return address. But I am
>> not entirely sure how to safely traverse that list for stack traces
>> not on the current process. So, I have taken the easy way out.
> 
> For kretprobes, unwinding from the trampoline or kretprobe handler
> shouldn't be a reliability concern for live patching, for similar
> reasons as above.
> 

Please see previous answer.

> Otherwise, when unwinding from a blocked task which has
> 'kretprobe_trampoline' on the stack, the unwinder needs a way to get the
> original return address.  Masami has been working on an interface to
> make that possible for x86.  I assume something similar could be done
> for arm64.
> 

OK. Until that is available, this case needs to be addressed.

Madhavan