[PATCH V7 04/10] arm64: exception: handle Synchronous External Abort

James Morse james.morse at arm.com
Thu Jan 19 09:55:22 PST 2017

Hi Tyler,

On 18/01/17 23:26, Baicar, Tyler wrote:
> On 1/17/2017 3:31 AM, James Morse wrote:
>> On 12/01/17 18:15, Tyler Baicar wrote:
>>> SEA exceptions are often caused by an uncorrected hardware
>>> error, and are handled when data abort and instruction abort
>>> exception classes have specific values for their Fault Status
>>> Code.
>>> When SEA occurs, before killing the process, go through
>>> the handlers registered in the notification list.
>>> Update fault_info[] with specific SEA faults so that the
>>> new SEA handler is used.
>>> @@ -480,6 +496,28 @@ static int do_bad(unsigned long addr, unsigned int esr,
>>> struct pt_regs *regs)
>>>       return 1;
>>>   }
>>>   +/*
>>> + * This abort handler deals with Synchronous External Abort.
>>> + * It calls notifiers, and then returns "fault".
>>> + */
>>> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>> +{
>>> +    struct siginfo info;
>>> +
>>> +    atomic_notifier_call_chain(&sea_handler_chain, 0, NULL);
>>> +
>>> +    pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>>> +         fault_name(esr), esr, addr);
>>> +
>>> +    info.si_signo = SIGBUS;
>>> +    info.si_errno = 0;
>>> +    info.si_code  = 0;
>> Half of the other do_*() functions in this file read the signo and code from the
>> fault_info table.
>>> +    info.si_addr  = (void __user *)addr;
>> addr here was read from FAR_EL1, but for some of the classes of exception you
>> have listed below this register isn't updated with the faulting address.
>> The ARM-ARM version 'k' in D1.10.5 "Summary of registers on faults taken to an
>> Exception level that is using Aarch64" has:
>>> The architecture permits that the FAR_ELx is UNKNOWN for Synchronous External
>>> Aborts other than Synchronous External Aborts on Translation Table Walks. In
>>> this case, the ISS.FnV bit returned in ESR_ELx  indicates whether FAR_ELx is
>>> valid.
>> This is a problem if we get 'synchronous external abort' or 'synchronous parity
>> error' while a user space process was running.

> It looks like this would just cause an incorrect address to be printed in the
> above pr_err.
> Unless I'm missing something, I don't see arm64_notify_die or anything that gets
> called from
> there using the info.si_addr variable.

I may be misreading something here...

This patch has:
>	info.si_addr  = (void __user *)addr;
>	arm64_notify_die("", regs, &info, esr);

>From arch/arm64/kernel/traps.c:arm64_notify_die():
>	if (user_mode(regs)) {
>		current->thread.fault_address = 0;
>		current->thread.fault_code = err;
>		force_sig_info(info->si_signo, info, current);
>	}

So if the SEA interrupted userspace, we put maybe-unknown addr into
force_sig_info() to deliver a signal to user space. User-space then gets a copy
of the info struct containing the maybe-unknown addr.

I think this is an existing bug, but if we are separating the synchronous
external aborts from the generic do_bad handler, we should probably check the
FnV bit. (I think we should still print it out)

> What do you suggest I do here? The firmware should be reporting the physical and
> virtual
> address information if it is available in the HEST entry that the kernel will
> parse.

Its not just firmware that may trigger this, other SoCs may use it for parity or
ECC errors, and they may not always have a valid address in FAR_EL1.

I think we should check the FnV bit in the esr variable and set info.si_addr to
0 if the addr we have isn't valid:
'For some implementations, the value of si_addr may be inaccurate.' [0]



[0] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html

More information about the linux-arm-kernel mailing list