4.4 BCM5301X ARM regression "External imprecise Data abort"

Lucas Stach l.stach at pengutronix.de
Fri Apr 8 01:43:13 PDT 2016


Am Freitag, den 08.04.2016, 08:45 +0200 schrieb Rafał Miłecki:
> On 4 April 2016 at 23:23, Hauke Mehrtens <hauke at hauke-m.de> wrote:
> > On 04/04/2016 11:08 PM, Scott Branden wrote:
> >> On 16-04-03 11:13 PM, Rafał Miłecki wrote:
> >>> I got regression reports from Netgear R8000 (BCM4709A0) users and did
> >>> some testing & regression tracking with Aditya.
> >>>
> >>> It happens that Linux 4.4 doesn't boot due to the following commits:
> >>> bbeb920 ("ARM: 8422/1: enable imprecise aborts during early kernel
> >>> startup")
> >>> 9254970 ("ARM: 8447/1: catch pending imprecise abort on unmask")
> >>> 937b123 ("ARM: BCM5301X: remove workaround imprecise abort fault
> >>> handler")
> >>>
> >>> In kernel 4.3 we got that abort workaround which was resulting in:
> >>> [    5.007128] Freeing unused kernel memory: 212K (c0435000 - c046a000)
> >>> [    5.694632] init: Console is alive
> >>> [    5.698169] init: - watchdog -
> >>> [    5.701470] External imprecise Data abort at addr=0x0, fsr=0x1406
> >>> ignored.
> >>> As you can see, this abort was happening soon after freeing unused
> >>> memory and ignoring it *once* did the trick. It was never appearing
> >>> again.
> >
> > I assume it only can throw one of these and if it is deactivated it will
> > ignore the next one or overwrite it. So it could be that more than one
> > is thrown here.
> >
> >>> With 4.4 similar (or the same?) abort happens earlier (during PCI host
> >>> driver init) and doesn't get ignored:
> >>> [    2.478461] pci 0000:00:00.0: PCI bridge to [bus 01]
> >>> [    2.483451] pci 0000:00:00.0:   bridge window [mem
> >>> 0x08000000-0x085fffff]
> >>> [    2.599449] pcie_iproc_bcma bcma0:8: PCI host bridge to bus 0001:00
> >>> [    2.605744] pci_bus 0001:00: root bus resource [mem
> >>> 0x40000000-0x47ffffff]
> >>> [    2.612657] pcie_iproc_bcma bcma0:8: link: UP
> >>> [    2.617241] PCI: bus0: Fast back to back transfers disabled
> >>> [    2.622845] pci 0001:00:00.0: bridge configuration invalid ([bus
> >>> 00-00]), reconfiguring
> >>> [    2.631297] PCI: bus1: Fast back to back transfers disabled
> >>> [    2.636887] pci 0001:01:00.0: bridge configuration invalid ([bus
> >>> 00-00]), reconfiguring
> >>> [    2.645035] Unhandled fault: imprecise external abort (0x1406) at
> >>> 0x00000000
> >>> (see 4.4.txt for the backtrace)
> >>>
> >>> At first I was hoping that we simply need to re-add the removed
> >>> workaround. I tried it but it appeared that one abort is immediately
> >>> followed by another:
> >>> [    2.936895] pci 0001:01:00.0: bridge configuration invalid ([bus
> >>> 00-00]), reconfiguring
> >>> [    2.945053] External imprecise Data abort at addr=0x0, fsr=0x1406
> >>> ignored.
> >>> [    2.951966] Unhandled fault: imprecise external abort (0x1406) at
> >>> 0x00000000
> >>>
> >>> So it seems that commits bbeb920 and 9254970 broke something in PCI
> >>> host initialization (or maybe just exposed another bug?). Instead of
> >>> getting an abort once and late we are getting now many of them and a
> >>> bit earlier.
> >
> > These commits mad the kernel earlier "listen" to such errors, so that
> > they will be shown at the time they occur and not sometime later.
> 
> So AFAIU with kernel 4.3:
> 1) Aborts were masked (silent) until "Freeing unused kernel memory"
> 2) There was one (silent) abort caused by a bootloader
> 3) There were likely multiple aborts (silent) during early PCI init
> 4) After unmasking we got only a single abort reported and we were ignoring it
> 
> With kernel 4.4:
> 1) All aborts are reported immediately
> 2) Abort caused by a bootloader gets ignored by ARM code:
> "Hit pending asynchronous external abort (FSR=0x00001c06) during first unmask"
> thanks to 9254970 ("ARM: 8447/1: catch pending imprecise abort on unmask")
> 3) There are still multiple aborts during PCI init (reported immediately now)
> 4) To work as before (in 4.3) we should ignore all aborts, not only the 1st one
> 
> Of course proposed solution is an ugly workaround, we should have no
> aborts reported in the first place.
> 
A master abort on the PCI bus during probe of the PCI config space
(device enumeration) is expected. Most host bridges ignore those errors
and just return 0 for the read transaction.

Some bridges forward the error onto the AXI/AMBA bus and thus cause
imprecise external aborts on the ARM core. If your host bridge doesn't
have a way to disable error forwarding during PCI bus probe you need to
install an abort handler. Most implementations based on the designware
PCIe core do this already.

Regards,
Lucas




More information about the linux-arm-kernel mailing list