[RFC PATCH v2 4/8] arm64: Detect an EL1 exception frame and mark a stack trace unreliable

Madhavan T. Venkataraman madvenka at linux.microsoft.com
Tue Mar 23 12:46:10 GMT 2021



On 3/23/21 5:42 AM, Mark Rutland wrote:
> On Mon, Mar 15, 2021 at 11:57:56AM -0500, madvenka at linux.microsoft.com wrote:
>> From: "Madhavan T. Venkataraman" <madvenka at linux.microsoft.com>
>>
>> EL1 exceptions can happen on any instruction including instructions in
>> the frame pointer prolog or epilog. Depending on where exactly they happen,
>> they could render the stack trace unreliable.
>>
>> If an EL1 exception frame is found on the stack, mark the stack trace as
>> unreliable.
>>
>> Now, the EL1 exception frame is not at any well-known offset on the stack.
>> It can be anywhere on the stack. In order to properly detect an EL1
>> exception frame the following checks must be done:
>>
>> 	- The frame type must be EL1_FRAME.
>>
>> 	- When the register state is saved in the EL1 pt_regs, the frame
>> 	  pointer x29 is saved in pt_regs->regs[29] and the return PC
>> 	  is saved in pt_regs->pc. These must match with the current
>> 	  frame.
> 
> Before you can do this, you need to reliably identify that you have a
> pt_regs on the stack, but this patch uses a heuristic, which is not
> reliable.
> 
> However, instead you can identify whether you're trying to unwind
> through one of the EL1 entry functions, which tells you the same thing
> without even having to look at the pt_regs.
> 
> We can do that based on the entry functions all being in .entry.text,
> which we could further sub-divide to split the EL0 and EL1 entry
> functions.
> 

Yes. I will check the entry functions. But I still think that we should
not rely on just one check. The additional checks will make it robust.
So, I suggest that the return address be checked first. If that passes,
then we can be reasonably sure that there are pt_regs. Then, check
the other things in pt_regs.

>>
>> Interrupts encountered in kernel code are also EL1 exceptions. At the end
>> of an interrupt, the interrupt handler checks if the current task must be
>> preempted for any reason. If so, it calls the preemption code which takes
>> the task off the CPU. A stack trace taken on the task after the preemption
>> will show the EL1 frame and will be considered unreliable. This is correct
>> behavior as preemption can happen practically at any point in code
>> including the frame pointer prolog and epilog.
>>
>> Breakpoints encountered in kernel code are also EL1 exceptions. The probing
>> infrastructure uses breakpoints for executing probe code. While in the probe
>> code, the stack trace will show an EL1 frame and will be considered
>> unreliable. This is also correct behavior.
>>
>> Signed-off-by: Madhavan T. Venkataraman <madvenka at linux.microsoft.com>
>> ---
>>  arch/arm64/include/asm/stacktrace.h |  2 +
>>  arch/arm64/kernel/stacktrace.c      | 57 +++++++++++++++++++++++++++++
>>  2 files changed, 59 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
>> index eb29b1fe8255..684f65808394 100644
>> --- a/arch/arm64/include/asm/stacktrace.h
>> +++ b/arch/arm64/include/asm/stacktrace.h
>> @@ -59,6 +59,7 @@ struct stackframe {
>>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>  	int graph;
>>  #endif
>> +	bool reliable;
>>  };
>>  
>>  extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
>> @@ -169,6 +170,7 @@ static inline void start_backtrace(struct stackframe *frame,
>>  	bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
>>  	frame->prev_fp = 0;
>>  	frame->prev_type = STACK_TYPE_UNKNOWN;
>> +	frame->reliable = true;
>>  }
>>  
>>  #endif	/* __ASM_STACKTRACE_H */
>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>> index 504cd161339d..6ae103326f7b 100644
>> --- a/arch/arm64/kernel/stacktrace.c
>> +++ b/arch/arm64/kernel/stacktrace.c
>> @@ -18,6 +18,58 @@
>>  #include <asm/stack_pointer.h>
>>  #include <asm/stacktrace.h>
>>  
>> +static void check_if_reliable(unsigned long fp, struct stackframe *frame,
>> +			      struct stack_info *info)
>> +{
>> +	struct pt_regs *regs;
>> +	unsigned long regs_start, regs_end;
>> +
>> +	/*
>> +	 * If the stack trace has already been marked unreliable, just
>> +	 * return.
>> +	 */
>> +	if (!frame->reliable)
>> +		return;
>> +
>> +	/*
>> +	 * Assume that this is an intermediate marker frame inside a pt_regs
>> +	 * structure created on the stack and get the pt_regs pointer. Other
>> +	 * checks will be done below to make sure that this is a marker
>> +	 * frame.
>> +	 */
> 
> Sorry, but NAK to this approach specifically. This isn't reliable (since
> it can be influenced by arbitrary data on the stack), and it's far more
> complicated than identifying the entry functions specifically.
> 

As I mentioned above, I agree that we should check the return address. But
just as a precaution, I think we should double check the pt_regs.

Is that OK with you? It does not take away anything or increase the risk in
anyway. I think it makes it more robust.

Thanks.

Madhavan



More information about the linux-arm-kernel mailing list