X-Gene: Unhandled fault: synchronous external abort in pci_generic_config_read32

Bjorn Helgaas bhelgaas at google.com
Tue Jul 28 11:36:02 PDT 2015


On Tue, Jul 28, 2015 at 12:39 PM, Duc Dang <dhdang at apm.com> wrote:
> On Mon, Jul 27, 2015 at 4:36 AM, Catalin Marinas
> <catalin.marinas at arm.com> wrote:
>> On Fri, Jul 24, 2015 at 05:05:19PM -0700, Duc Dang wrote:
>>> On Fri, Jul 24, 2015 at 3:42 PM, Bjorn Helgaas <bhelgaas at google.com> wrote:
>>> > I regularly see faults like this on an APM X-Gene:
>>> >
>>> >   U-Boot 2013.04-mustang_sw_1.14.14 (Dec 16 2014 - 15:59:33)
>>> >   CPU0: APM ARM 64-bit Potenza Rev B0 2400MHz PCP 2400MHz
>>> >        32 KB ICACHE, 32 KB DCACHE
>>> >        SOC 2000MHz IOBAXI 400MHz AXI 250MHz AHB 200MHz GFC 125MHz
>>> >   ...
>>> >   Unhandled fault: synchronous external abort (0x96000010) at 0xffffff8000110034
>>
>> That's generated by an external device (PCIe root complex, card etc.)
>> and some mis-configured CPU setting.
>>
>>> >   Internal error: : 96000010 [#1] SMP
>>> >   Modules linked in:
>>> >   CPU: 0 PID: 3723 Comm: ... 4.1.0-smp-DEV #3
>>> >   Hardware name: APM X-Gene Mustang board (DT)
>>> >   task: ffffffc7dc1a4140 ti: ffffffc7dc118000 task.ti: ffffffc7dc118000
>>> >   PC is at pci_generic_config_read32+0x4c/0xb8
>>> >   LR is at pci_generic_config_read32+0x40/0xb8
>>> >   pc : [<ffffffc00033b90c>] lr : [<ffffffc00033b900>] pstate: 600001c5
>>> >   ...
>>> >   Call trace:
>>> >   [<ffffffc00033b90c>] pci_generic_config_read32+0x4c/0xb8
>>> >   [<ffffffc00033bf58>] pci_user_read_config_byte+0x60/0xc4
>>> >   [<ffffffc0003496a8>] pci_read_config+0x15c/0x238
>>> >   [<ffffffc0002393b4>] sysfs_kf_bin_read+0x68/0xa0
>>> >   [<ffffffc00023896c>] kernfs_fop_read+0x9c/0x1ac
>>> >   [<ffffffc0001c361c>] __vfs_read+0x44/0x128
>>> >   [<ffffffc0001c3e28>] vfs_read+0x84/0x144
>>> >   [<ffffffc0001c4764>] SyS_read+0x50/0xb0
>>>
>>> The log shows kernel gets an exception when trying to access Mellanox
>>> card configuration space. This is usually due to suboptimal PCIe
>>> SerDes parameters are using in your board, which will cause bad link
>>> quality.
>>
>> I would have hoped that "suboptimal" means that it still works, albeit
>> not fully optimal ;).
>
> Yes, it should still work, but you may see crashes occasionally due to
> link quality.

A crash seems like a too-severe response to a link quality issue.
Isn't there some way to retry the access or return an error, so we
don't have to crash the whole system?



More information about the linux-arm-kernel mailing list