[RFC PATCH v1 1/1] arm64: Unwinder enhancements for reliable stack trace

Madhavan T. Venkataraman madvenka at linux.microsoft.com
Tue Feb 23 14:20:49 EST 2021



On 2/23/21 1:02 PM, Mark Brown wrote:
> On Tue, Feb 23, 2021 at 12:12:43PM -0600, madvenka at linux.microsoft.com wrote:
>> From: "Madhavan T. Venkataraman" <madvenka at linux.microsoft.com>
>>
>> Unwinder changes
>> ================
> 
> This is making several different changes so should be split into a patch
> series - for example the change to terminate on a specific function
> pointer rather than NULL and the changes to the exception/interupt
> detection should be split.  Please see submitting-patches.rst for some
> discussion about how to split things up.  In general if you've got a
> changelog enumerating a number of different changes in a patch that's a
> warning sign that it might be good split things up.
> 

Will do.

> You should also copy the architecture maintainers (Catalin and Will) on
> any arch/arm64 submissions.
> 

Will do when I resubmit.

>> 	Unwinder return value
>> 	=====================
>>
>> 	Currently, the unwinder returns -EINVAL for stack trace termination
>> 	as well as stack trace error. Return -ENOENT for stack trace
>> 	termination and -EINVAL for error to disambiguate. This idea has
>> 	been borrowed from Mark Brown.
> 
> You could just include my patch for this in your series.
> 

OK.

>> Reliable stack trace function
>> =============================
>>
>> Implement arch_stack_walk_reliable(). This function walks the stack like
>> the existing stack trace functions with a couple of additional checks:
>>
>> 	Return address check
>> 	--------------------
>>
>> 	For each frame, check the return address to see if it is a
>> 	proper kernel text address. If not, return -EINVAL.
>>
>> 	Exception frame check
>> 	---------------------
>>
>> 	Check each frame to see if it is an EL1 exception frame. If it is,
>> 	return -EINVAL.
> 
> Again, this should be at least one separate patch.  How does this ensure
> that we don't have any issues with any of the various probe mechanisms?
> If there's no need to explicitly check anything that should be called
> out in the changelog.
> 

I am trying to do this in an incremental fashion. I have to study the probe
mechanisms a little bit more before I can come up with a solution. But
if you want to see that addressed in this patch set, I could do that.
It will take a little bit of time. That is all.

> Since all these changes are mixed up this is a fairly superficial
> review of the actual code.
> 

Understood. I will split things up and we can take it from there.

>> +static notrace struct pt_regs *get_frame_regs(struct task_struct *task,
>> +					      struct stackframe *frame)
>> +{
>> +	unsigned long stackframe, regs_start, regs_end;
>> +	struct stack_info info;
>> +
>> +	stackframe = frame->prev_fp;
>> +	if (!stackframe)
>> +		return NULL;
>> +
>> +	(void) on_accessible_stack(task, stackframe, &info);
> 
> Shouldn't we return NULL if we are not on an accessible stack?
> 

The prev_fp has already been checked by the unwinder in the previous
frame. That is why I don't check the return value. If that is acceptable,
I will add a comment.

>> +static notrace int update_frame(struct task_struct *task,
>> +				struct stackframe *frame)
> 
> This function really needs some documentation, the function is just
> called update_frame() which doesn't say what sort of updates it's
> supposed to do and most of the checks aren't explained, not all of them
> are super obvious.
> 

I will add the documentation as well as try think of a better name.

>> +{
>> +	unsigned long lsb = frame->fp & 0xf;
>> +	unsigned long fp = frame->fp & ~lsb;
>> +	unsigned long pc = frame->pc;
>> +	struct pt_regs *regs;
>> +
>> +	frame->exception_frame = false;
>> +
>> +	if (fp == (unsigned long) arm64_last_frame &&
>> +	    pc == (unsigned long) arm64_last_func)
>> +		return -ENOENT;
>> +
>> +	if (!lsb)
>> +		return 0;
>> +	if (lsb != 1)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * This looks like an EL1 exception frame.
> 
> For clarity it would be good to spell out the properties of an EL1
> exception frame.  It is not clear to me why we don't reference the frame
> type information the unwinder already records as part of these checks.
> 
> In general, especially for the bits specific to reliable stack trace, I
> think we want to err on the side of verbosity here so that it is crystal
> clear what all the checks are supposed to be doing and it's that much
> easier to tie everything through to the requirements document.

OK. I will improve the documentation.

Madhavan



More information about the linux-arm-kernel mailing list