SError Interrupt on CPU0, code 0xbf000000 makes kernel panic

Robin Murphy robin.murphy at arm.com
Thu Mar 24 08:54:13 PDT 2022


On 2022-03-24 15:42, Joakim Tjernlund wrote:
> On Thu, 2022-03-24 at 15:25 +0000, Marc Zyngier wrote:
>> On Thu, 24 Mar 2022 15:11:42 +0000,
>> Joakim Tjernlund <Joakim.Tjernlund at infinera.com> wrote:
>>>
>>> On Thu, 2022-03-24 at 15:05 +0000, Robin Murphy wrote:
>>
>>>> Well, except when it is... try that on a Qualcomm SoC where the EL2
>>>> firmware will trap you and reset the system before you even know you've
>>>> done anything wrong. If you know enough to know that an error triggered
>>>> by accessing some address is truly benign, you know enough to avoid
>>>> making that access in the first place.
>>>
>>> of course the error will be dealt with but why make bug finding
>>> harder than it has to be?
>>
>> Maybe that was not clear enough from our earlier replies. Let me try
>> again.
>>
>> There is *nothing* more the kernel can do. We don't even know what
>> caused the access (read, write, earthquake or foreign power invasion).
>>
>> By the time we get the SError interrupt, we could well be running
>> something altogether different because all of that is totally
>> asynchronous *by nature*. You're just lucky that you get the response
>> quickly enough that the kernel is still running the offending
>> userspace.
> 
> I worked ppc earlier and there am used to get an exception(MachineCheck) with PC and Data address
> for similar cases and can usually pass that on to user space as a SIGBUS and kernel moves along.
> 
> Seems ARM works very differently and pulls the plug directly, just finding it odd though.

Linux necessarily has to operate within the bounds of the architecture 
on which it's running, while you as an external observer of the entire 
system do not. If you find it inconvenient that Linux handles an 
unattributable error by not attributing it to the cause that your 
higher-level comparatively omnipotent knowledge can, and you are 
confident that on *your* system in *your* debugging scenario, there are 
no other possible sources of unattributable errors, then feel free to 
hack Linux locally to not panic on an unattributable error. Just 
understand why it's a local hack and you won't be sending a patch upstream.

Robin.



More information about the linux-arm-kernel mailing list