[PATCH RESEND] arm64: fault: avoid send SIGBUS two times

James Morse james.morse at arm.com
Thu Dec 7 06:32:52 PST 2017


Hi gengdongjiu, Will,

On 07/12/17 05:55, gengdongjiu wrote:
> On 2017/12/7 0:15, Will Deacon wrote:
>>> --- a/arch/arm64/mm/fault.c
>>> +++ b/arch/arm64/mm/fault.c
>>> @@ -570,7 +570,6 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>>  {
>>>  	struct siginfo info;
>>>  	const struct fault_info *inf;
>>> -	int ret = 0;
>>>  
>>>  	inf = esr_to_fault_info(esr);
>>>  	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>>> @@ -585,7 +584,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>>  		if (interrupts_enabled(regs))
>>>  			nmi_enter();
>>>  
>>> -		ret = ghes_notify_sea();
>>> +		ghes_notify_sea();
>>>  
>>>  		if (interrupts_enabled(regs))
>>>  			nmi_exit();
>>> @@ -600,7 +599,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>>  		info.si_addr  = (void __user *)addr;
>>>  	arm64_notify_die("", regs, &info, esr);
>>>  
>>> -	return ret;
>>> +	return 0;

>> Hmm, so this code is a bit of mess.
>>
>> Wouldn't it be better to have the signal dispatching code in do_mem_abort
>> check ESR.ESR_ELx_FnV, so then do_sea wouldn't have to, and we could just
>> return an error instead?

FnV only applies to one of the Synchronous External Abort ESRs, hence it ended
up in here.


> Regardless ghes_notify_sea()'s return value, it always needs to deliver signal,
> because ghes_notify_sea()'s return value does not reflect whether the memory error
> handler(memory_failure()) handle the error successfully or failed. If let do_mem_abort()
> delivers the signal, we should always let do_sea() return error, then  the do_mem_abort() can
> always deliver signal. Then we will see the strange log as shown below when happen Synchronous External Abort.
> 
> [  676.700652] Synchronous External Abort: synchronous external abort (0x96000410) at 0x0000000033ff7008
> [  676.723301] Unhandled fault: synchronous external abort (0x96000410) at 0x0000000033ff7008
> 
> so I think it is better send the signal in the do_sea(), not send it in the do_mem_abort().

I agree: I think improving the commit message would help here, something like:
---------
do_sea() calls arm64_notify_die() which will always signal user-space.
It also returns whether APEI claimed the external abort as a RAS notification.
If it returns failure do_mem_abort() will signal user-space too.

do_mem_abort() wants to know if we handled the error, we always call
arm64_notify_die() so can always return success.
---------

APEI's return value matters for KVM, and it will matter here too if we support
kernel-first.


> do_mem_abort() only send the signal when the exception does not defined in fault_info[]. Another benefit
> is that do_sea() can send different signal according to the Synchronous External Abort type, such as SIGBUS or SIGKILL.
> the do_mem_abort() can only send one kind signal.

(I'm not convinced we want to do this other than via the firwmare/kernel RAS
code, but that is a separate issue)


Thanks,

James




More information about the linux-arm-kernel mailing list