X-Gene: Unhandled fault: synchronous external abort in pci_generic_config_read32

Bjorn Helgaas bhelgaas at google.com
Mon Aug 10 10:42:19 PDT 2015


On Mon, Aug 10, 2015 at 12:16 PM, Duc Dang <dhdang at apm.com> wrote:
> On Monday, August 10, 2015, Bjorn Helgaas <bhelgaas at google.com> wrote:
>>
>> On Fri, Jul 31, 2015 at 12:00 PM, Duc Dang <dhdang at apm.com> wrote:
>> > On Wed, Jul 29, 2015 at 8:55 AM, Bjorn Helgaas <bhelgaas at google.com>
>> > wrote:
>> >> On Tue, Jul 28, 2015 at 08:22:55PM -0500, Bjorn Helgaas wrote:
>> >>> On Tue, Jul 28, 2015 at 02:50:39PM -0700, Duc Dang wrote:
>> >>
>> >>> > Do you have another PCIe card to try on the same reboot test on this
>> >>> > board?
>> >>>
>> >>> I've seen this on at least two Mellanox cards.  I'm running similar
>> >>> tests
>> >>> on a different type of card now.
>> >>
>> >> FWIW, reboot tests on two machines with Mellanox cards failed, while
>> >> the
>> >> same test on a machine with a different proprietary card succeeded.
>> >
>> > Thanks, Bjorn.
>> >
>> > I don't have the same Mellanox card as yours, but I will also run
>> > similar reboot test to see if I hit the same issue with my card.
>>
>> Any more hints on this?  Nothing has changed on my end, so of course
>> I'm still seeing this, always on machines with Mellanox, and never on
>> other machines.  Could this be a hardware issue like a signal
>> integrity or margin issue?  I don't know where to go from here because
>> I'm not a hardware person, and I don't know anything to do in
>> software.
>
>
> Hi Bjorn,
>
> I tried to run similar reboot tests on 2 different Mellanox cards (Connect-X
> family, one card has 2 10G interfaces, the other one has 1 port that
> supports InfiniBand) with U-Boot 1.15.12 and linux 4.2-rc5 and I did not see
> the crash that you encounterred.
>
> Did you check if your Mellanox cards have latest firmware? I did see some
> link issues on my Mellanox cards with its old firmware before.

Good idea; I'll check that, too.  Also, I just learned that these
cards on installed with an extender card because of some space issues,
so we're going to test again without the extender.



More information about the linux-arm-kernel mailing list