[bugzilla-daemon at kernel.org: [Bug 217276] New: Kernel panic - not syncing: Asynchronous SError Interrupt (brcm_pcie_probe), with Raspberry Pi CM4 + PCIe setups]

Bjorn Helgaas helgaas at kernel.org
Thu Mar 30 17:59:36 PDT 2023


Thanks very much for the report!  I'm forwarding it to maintainers and
lists, since most don't monitor bugzilla.

----- Forwarded message from bugzilla-daemon at kernel.org -----

https://bugzilla.kernel.org/show_bug.cgi?id=217276

Created attachment 304062
  --> https://bugzilla.kernel.org/attachment.cgi?id=304062&action=edit
dmesg with master (b2bc47e9b201)

Hi,

This bug can be tricky to reproduce, since hitting or dodging it seems very
much dependent on the actual chips and revisions of all involved components.

The general setup is:

- Raspberry Pi Compute Module 4
- Raspberry Pi Compute Module 4 IO Board (carrier board)
- Something plugged onto the PCIe slot

At the moment, I'm able to reproduce this issue reliably with:

- Compute Module 4 including eMMC (Compute Module 4 Lite, without eMMC, using
the exact same operating system image on an SD card, doesn't trigger the
issue).
- SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006S
(PCE6U1C-R02, VER 006 looks very similar, but definitely includes different
chips on its PCB, and doesn't trigger the issue).

With either v6.1.20 as packaged by Debian, or with a local master build (as of
b2bc47e9b201), plus a Debian testing userspace, I'm hitting the following
kernel panic:

```
[    1.914315] Kernel panic - not syncing: Asynchronous SError Interrupt
[    1.914317] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc4+ #1
[    1.914322] Hardware name: Raspberry Pi Compute Module 4 Rev 1.1 (DT)
[    1.914324] Call trace:
[    1.914326]  dump_backtrace+0xa8/0x138
[    1.914333]  show_stack+0x20/0x38
[    1.914336]  dump_stack_lvl+0x48/0x60
[    1.914345]  dump_stack+0x18/0x28
[    1.914350]  panic+0x378/0x398
[    1.914355]  nmi_panic+0xb4/0xc0
[    1.914359]  arm64_serror_panic+0x78/0x90
[    1.914363]  do_serror+0x30/0x70
[    1.914367]  el1h_64_error_handler+0x30/0x48
[    1.914371]  el1h_64_error+0x64/0x68
[    1.914375]  pci_generic_config_read+0x44/0xe8
[    1.914380]  pci_bus_read_config_dword+0x98/0x140
[    1.914386]  pci_bus_generic_read_dev_vendor_id+0x3c/0x1c0
[    1.914390]  pci_scan_single_device+0xa8/0x118
[    1.914393]  pci_scan_slot+0x6c/0x1e0
[    1.914396]  pci_scan_child_bus_extend+0x50/0x2e0
[    1.914399]  pci_scan_bridge_extend+0x31c/0x5a8
[    1.914403]  pci_scan_child_bus_extend+0x1c4/0x2e0
[    1.914406]  pci_scan_root_bus_bridge+0x6c/0xf8
[    1.914409]  pci_host_probe+0x20/0xd0
[    1.914413]  brcm_pcie_probe+0x294/0x618
[    1.914419]  platform_probe+0x70/0xe8
[    1.914426]  really_probe+0x18c/0x3d8
[    1.914429]  __driver_probe_device+0x84/0x198
[    1.914434]  driver_probe_device+0x44/0x120
[    1.914437]  __driver_attach+0xfc/0x210
[    1.914441]  bus_for_each_dev+0x7c/0xe8
[    1.914445]  driver_attach+0x2c/0x40
[    1.914448]  bus_add_driver+0x118/0x228
[    1.914452]  driver_register+0x68/0x138
[    1.914456]  __platform_driver_register+0x30/0x48
[    1.914461]  brcm_pcie_driver_init+0x24/0x38
[    1.914468]  do_one_initcall+0x4c/0x238
[    1.914472]  kernel_init_freeable+0x21c/0x3f0
[    1.914479]  kernel_init+0x2c/0x1f8
[    1.914483]  ret_from_fork+0x10/0x20
```

Full dmesg captured from b2bc47e9b201 is attached, I'll follow up with a very
similar trace using v6.1.20.

Serial logging implemented this way, should that matter:

- "earlycon console=ttyS1,115200" on the kernel command line;
- "enable_jtag_gpio=1" and "force_turbo=1" in config.txt (consumed by the
bootloader);
- and pins 6, 8, 10 on the pin header hooked up on a cp210x-based serial
adapter.


Reminder: there was some discussion around the possible need for a subnode in
the DTB when I filed the PCIe regression a while back
(https://bugzilla.kernel.org/show_bug.cgi?id=215925).

I'm happy to test any patches and provide any input you folks might need.


Cheers,
Cyril.



More information about the linux-arm-kernel mailing list