[RFC PATCH v2 3/3] arm64: stacktrace: Prevent looping and invalid stack transitions

Dave Martin Dave.Martin at arm.com
Fri Apr 27 04:34:03 PDT 2018


On Fri, Apr 27, 2018 at 12:23:21PM +0100, James Morse wrote:
> Hi Dave,
> 
> On 23/04/18 18:07, Dave Martin wrote:
> > In the event of stack corruption, backtraces may loop indefinitely
> > or wander off into memory that is not a valid stack for the context
> > being backtraced.
> > 
> > This patch makes backtracing more robust against stack corruption,
> > by taking two things into account:
> > 
> >  * while staying on the same stack, the frame address must strictly
> >    increase from each frame to its ancestor;
> > 
> >  * when transitioning to another stack, the destination stack must
> >    be a stack that has not been visited already.
> > 
> > on_accessible_stack() is converted to a more expressive helper
> > identify_stack() that also tells the caller _which_ stack the task
> > appears to be on.  A field stack_id is added to struct stackframe
> > to track the result of this for the frame, along with a bitmap
> > stacks_used that records which stacks have been visited by the
> > backtrace.
> > 
> > Now that it is easy to check whether two successive frames are on
> > the same stack, it becomes straightfowrard to add a check for
> > strictly increasing frame address, to avoid looping around the same
> > stack: this patch adds that too.
> > 
> > This patch does not attempt to catch invalid transitions such as
> > from the task stack to the IRQ stack.  It turns out that the way
> > the overflow stack is used makes this complicated.  Nonetheless,
> > the number of frames parsed before the backtracer terminates should
> > now be guaranteed to be finite.
> 
> Sounds like a good idea, I gave this a quick spin with SDEI...
> 
> With this series on top of the SDEI test goo:
> [51497.386354] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 4.17.0-rc2-00023-g144855c44b20-dirty #9721
> [51497.386366] Call trace:
> [51497.386409]  dump_backtrace+0x0/0x1b8
> 
> 
> Without this series:
> [275604.647803] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 4.17.0-rc2-00018-gcf83397e1eaf-dirty #9722
> [275604.647814] Call trace:
> [275604.647857]  dump_backtrace+0x0/0x190
> [275604.647900]  show_stack+0x14/0x20
> [275604.647938]  dump_stack+0xac/0xe4
> [275604.647972]  __sdei_handler+0x184/0x1d0
> [275604.648009]  __sdei_asm_handler+0xbc/0x130
> [275604.648050]  cpu_do_idle+0x8/0xc
> [275604.648085]  default_idle_call+0x1c/0x38
> [275604.648115]  do_idle+0x1bc/0x270
> [275604.648147]  cpu_startup_entry+0x24/0x28
> [275604.648189]  rest_init+0x24c/0x260
> [275604.648226]  start_kernel+0x3b8/0x3cc
> 
> 
> (please forgive the timestamps, the FVP thinks its a time machine)
> 
> 
> > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> > index d5718a0..a5e0547 100644
> > --- a/arch/arm64/kernel/stacktrace.c
> > +++ b/arch/arm64/kernel/stacktrace.c
> > @@ -75,14 +77,30 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> >  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> >  
> >  	/*
> > -	 * Frames created upon entry from EL0 have NULL FP and PC values, so
> > -	 * don't bother reporting these. Frames created by __noreturn functions
> > -	 * might have a valid FP even if PC is bogus, so only terminate where
> > -	 * both are NULL.
> > +	 * Terminate when the next frame isn't on any valid stack for
> > +	 * tsk.  As a special case, frames created upon entry from EL0
> > +	 * have NULL FP and PC values, so will terminate here also.
> > +	 * Frames created by __noreturn functions might have a valid FP
> > +	 * even if PC is bogus, so only terminate where FP is invalid.
> >  	 */
> > -	if (!frame->fp && !frame->pc)
> > +	if (frame->stack_id == ARM64_STACK_NONE)
> >  		return -EINVAL;
> 
> 
> > +	/*
> > +	 * Reject attempts to recurse back onto a stack that has been
> > +	 * visited already:
> > +	 */
> > +	if (test_bit(frame->stack_id, frame->stacks_used))
> > +		return -EINVAL;
> 
> Won't this happen for every frame? So we only get one frame per stack.
> Could we move this into some special handling for stepping between stacks?

D'oh... that check is only supposed to happen on a stack switch.

> > +
> > +	/* Verify forward progress while on the same stack: */
> > +	if (frame->stack_id == stack_id) {
> > +		if (frame->fp <= fp)
> > +			return -EINVAL;
> > +	}

Can you try adding

	else {
		if (test_bit(frame->stack_id, frame->stacks_used))
			return -EINVAL;
	}

(and removing that above)

Cheers
---Dave



More information about the linux-arm-kernel mailing list