[RFC PATCH 1/3] PCI: rockchip: provide workaround for bus scan crash with optional delay
Jari Hämäläinen
nuumiofi at gmail.com
Sat Jan 2 08:43:37 EST 2021
On Fri, Jan 1, 2021 at 7:37 PM Bjorn Helgaas <helgaas at kernel.org> wrote:
>
> On Thu, Dec 31, 2020 at 02:52:12PM +0200, Jari Hämäläinen wrote:
> > Some PCIe devices cause Rockchip PCIe controller to crash in bus scan.
> > Crash may be avoided by delaying bus scan by time given from command line
> > or from device tree. Needed amount of delay varies from device to device.
> > Delay doesn't have to be exact. It just has to be long enough.
>
> Is this a standard post-reset delay that the rockchip driver is
> missing? Maybe compare with other drivers to see if rockchip is
> missing something.
>
> Is this an erratum in the Rockchip IP? If so, we should have a
> specific description and a citation for it, and a workaround could be
> done automatically without DT or command-line switches.
Thanks for your reply!
This patch was not based on Rockchip erratum or other documentation. It was
found by a lucky shot when trying to get Rockchip PCIe working with
these devices. I'm sorry for not mentioning that in the first place. In
that sense "a hack" would be a better description than "workaround".
I'll look at other drivers and see if I can spot anything missing from
Rockchip. Designware driver seems like a good place to start. I'm newbie in
kernel hacking and even more so with PCIe so pointers are welcome.
> > The following lists few problematic PCIe devices with delays needed for
> > stable bus scan surviving 100 sequential reboots in test loop executed on
> > RockPro64 (RK3399 single-board computer):
> > - LSI 9201-8i / SAS2008 chipset [1000:0072]: 725 ms
> > - LSI 9302-8i / SAS3008 chipset [1000:0097]: 575 ms (1)
> > - HP H220 / SAS2308 chipset [1000:0087]: 800 ms (2)
> > - IBM ServeRAID M5110 / SAS2208 chipset [1000:005b]: 1050 ms (3)
> >
> > 1) mpt3sas module has soft lockup bug on shutdown but device is usable
> > 2) has infrequent crash on mpt3sas module load (2 of 662 reboots in all
> > test sessions with this device crashed on module load)
> > 3) megaraid_sas module crashes on load so device remains unusable
> > (bus scan tested with module being blacklisted)
> >
> > Side effect of delay, if set, is that it slows down system startup by the
> > amount of delay.
> >
> > Log excerpt showing a crash happening always on unpatched kernel with
> > problematic PCIe devices listed above rendering them unusable:
>
> It doesn't seem likely that the devices above are broken since we
> don't have problems with them on other systems. More likely to be
> some Rockchip-specific thing, and the devices above are operating
> within spec (possibly using more of the allowed post-reset time than
> most devices).
This seems to be Rockchip-specific indeed. All devices above worked fine on
x86-based setup.
> > [ 1.240649] SError Interrupt on CPU5, code 0xbf000002 -- SError
>
> We really should know more about what the specific error is. Most
> errors on PCIe should be recoverable and they can happen at any time,
> not just at boot-time.
>
> This patch adds a boot-time delay. At run-time, if we power-cycle or
> reset a device and re-enumerate the bus, we would likely see the same
> problem and this patch wouldn't help.
I will try to dig deeper into details of this error. Maybe megaraid_sas
driver crashing on module load is a manifestation of same problem and could
offer a hint or at least another viewpoint to what goes on.
> > [ 1.240653] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.10.2-stable #1
> > [ 1.240656] Hardware name: Pine64 RockPro64 v2.0 (DT)
> > [ 1.240659] pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--)
> > [ 1.240661] pc : rockchip_pcie_rd_conf+0x178/0x268
> > [ 1.240664] lr : rockchip_pcie_rd_conf+0x1b8/0x268
> > [ 1.240666] sp : ffff8000119db850
> > [ 1.240669] x29: ffff8000119db850 x28: 0000000000000000
> > [ 1.240676] x27: 0000000000000000 x26: 0000000000000000
> > [ 1.240682] x25: ffff8000119db984 x24: 0000000000000000
> > [ 1.240688] x23: 0000000000000000 x22: ffff000040ba0b80
> > [ 1.240694] x21: ffff8000119db8d4 x20: 0000000000000004
> > [ 1.240700] x19: 0000000000100000 x18: ffffffffffffffff
> > [ 1.240706] x17: 0000000031cae143 x16: 000000008c75157c
> > [ 1.240712] x15: ffff800011729908 x14: ffff000040c87a1c
> > [ 1.240718] x13: ffff000040c87293 x12: 0000000000000038
> > [ 1.240724] x11: 0000000005f5e0ff x10: 7f7f7f7f7f7f7f7f
> > [ 1.240729] x9 : 0000000001001d87 x8 : 000000000000ea60
> > [ 1.240735] x7 : ffff8000119db984 x6 : 0000000000000000
> > [ 1.240741] x5 : 0000000000000000 x4 : 0000000000c00008
> > [ 1.240747] x3 : ffff800017000000 x2 : 000000000080000a
> > [ 1.240753] x1 : 0000000000000000 x0 : ffff800014000000
> > [ 1.240759] Kernel panic - not syncing: Asynchronous SError Interrupt
> > [ 1.240763] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.10.2-stable #1
> > [ 1.240765] Hardware name: Pine64 RockPro64 v2.0 (DT)
> > [ 1.240768] Call trace:
> > [ 1.240770] dump_backtrace+0x0/0x1e8
> > [ 1.240772] show_stack+0x18/0x60
> > [ 1.240775] dump_stack+0xd8/0x130
> > [ 1.240777] panic+0x15c/0x380
> > [ 1.240779] add_taint+0x0/0xb0
> > [ 1.240782] arm64_serror_panic+0x78/0x88
> > [ 1.240784] do_serror+0x3c/0x68
> > [ 1.240787] el1_error+0x84/0x104
> > [ 1.240789] rockchip_pcie_rd_conf+0x178/0x268
> > [ 1.240791] pci_bus_read_config_dword+0xa4/0x150
> > [ 1.240794] pci_bus_generic_read_dev_vendor_id+0x30/0x1b0
> > [ 1.240797] pci_bus_read_dev_vendor_id+0x4c/0x78
> > [ 1.240800] pci_scan_single_device+0x80/0x100
> > [ 1.240802] pci_scan_slot+0x38/0x130
> > [ 1.240805] pci_scan_child_bus_extend+0x58/0x348
> > [ 1.240807] pci_scan_bridge_extend+0x304/0x5a0
> > [ 1.240810] pci_scan_child_bus_extend+0x20c/0x348
> > [ 1.240812] pci_scan_root_bus_bridge+0x64/0xf0
> > [ 1.240815] pci_host_probe+0x18/0xc8
> > [ 1.240817] rockchip_pcie_probe+0x34c/0x4b8
> > [ 1.240820] platform_drv_probe+0x54/0xa8
> > [ 1.240822] really_probe+0x29c/0x4f8
> > [ 1.240824] driver_probe_device+0xfc/0x168
> > [ 1.240827] device_driver_attach+0x74/0x80
> > [ 1.240829] __driver_attach+0xb8/0x168
> > [ 1.240832] bus_for_each_dev+0x7c/0xd8
> > [ 1.240834] driver_attach+0x24/0x30
> > [ 1.240837] bus_add_driver+0x15c/0x240
> > [ 1.240839] driver_register+0x64/0x120
> > [ 1.240841] __platform_driver_register+0x44/0x50
> > [ 1.240844] rockchip_pcie_driver_init+0x1c/0x28
> > [ 1.240846] do_one_initcall+0x60/0x1d8
> > [ 1.240849] kernel_init_freeable+0x234/0x2b4
> > [ 1.240851] kernel_init+0x14/0x118
> > [ 1.240854] ret_from_fork+0x10/0x34
> > [ 1.240878] SMP: stopping secondary CPUs
> > [ 1.240881] Kernel Offset: disabled
> > [ 1.240883] CPU features: 0x0240022,2100200c
> > [ 1.240886] Memory Limit: none
> >
> > Signed-off-by: Jari Hämäläinen <nuumiofi at gmail.com>
> > ---
> > .../admin-guide/kernel-parameters.txt | 8 ++++++++
> > drivers/pci/controller/pcie-rockchip-host.c | 18 ++++++++++++++++++
> > drivers/pci/controller/pcie-rockchip.c | 5 +++++
> > drivers/pci/controller/pcie-rockchip.h | 2 ++
> > 4 files changed, 33 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index c722ec19cd00..fda9bb9c85c3 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -3823,6 +3823,14 @@
> > nomsi Do not use MSI for native PCIe PME signaling (this makes
> > all PCIe root ports use INTx for all services).
> >
> > + pcie_rockchip_host.bus_scan_delay_ms=
> > + [PCIE] delay before PCIe bus scan in milliseconds.
> > + If set to greater than or equal to 0 this parameter will
> > + override delay set in device tree. Values less than 0
> > + are ignored. This parameter provides a workaround for
> > + some devices causing a crash in bus scan.
> > + Default: -1
> > +
> > pcmv= [HW,PCMCIA] BadgePAD 4
> >
> > pd_ignore_unused
> > diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c
> > index f1d08a1b1591..14733c92b25c 100644
> > --- a/drivers/pci/controller/pcie-rockchip-host.c
> > +++ b/drivers/pci/controller/pcie-rockchip-host.c
> > @@ -24,6 +24,7 @@
> > #include <linux/kernel.h>
> > #include <linux/mfd/syscon.h>
> > #include <linux/module.h>
> > +#include <linux/moduleparam.h>
> > #include <linux/of_address.h>
> > #include <linux/of_device.h>
> > #include <linux/of_pci.h>
> > @@ -39,6 +40,9 @@
> > #include "../pci.h"
> > #include "pcie-rockchip.h"
> >
> > +static int bus_scan_delay_ms = -1;
> > +module_param(bus_scan_delay_ms, int, 0444);
> > +
> > static void rockchip_pcie_enable_bw_int(struct rockchip_pcie *rockchip)
> > {
> > u32 status;
> > @@ -941,6 +945,7 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
> > struct device *dev = &pdev->dev;
> > struct pci_host_bridge *bridge;
> > int err;
> > + u32 delay = 0;
> >
> > if (!dev->of_node)
> > return -ENODEV;
> > @@ -992,6 +997,19 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
> > bridge->sysdata = rockchip;
> > bridge->ops = &rockchip_pcie_ops;
> >
> > + /*
> > + * Work around a crash caused by some devices on bus scan by applying a
> > + * delay if one is given. Prefer command line value over device tree.
> > + */
> > + if (bus_scan_delay_ms >= 0)
> > + delay = bus_scan_delay_ms;
> > + else
> > + delay = rockchip->bus_scan_delay_ms;
> > + if (delay > 0) {
> > + dev_info(dev, "delay bus scan for %u ms\n", delay);
> > + msleep(delay);
> > + }
> > +
> > err = pci_host_probe(bridge);
> > if (err < 0)
> > goto err_remove_irq_domain;
> > diff --git a/drivers/pci/controller/pcie-rockchip.c b/drivers/pci/controller/pcie-rockchip.c
> > index 904dec0d3a88..2e49e9204894 100644
> > --- a/drivers/pci/controller/pcie-rockchip.c
> > +++ b/drivers/pci/controller/pcie-rockchip.c
> > @@ -149,6 +149,11 @@ int rockchip_pcie_parse_dt(struct rockchip_pcie *rockchip)
> > return PTR_ERR(rockchip->clk_pcie_pm);
> > }
> >
> > + err = of_property_read_u32(node, "rockchip,bus-scan-delay-ms",
> > + &rockchip->bus_scan_delay_ms);
> > + if (err)
> > + rockchip->bus_scan_delay_ms = 0;
> > +
> > return 0;
> > }
> > EXPORT_SYMBOL_GPL(rockchip_pcie_parse_dt);
> > diff --git a/drivers/pci/controller/pcie-rockchip.h b/drivers/pci/controller/pcie-rockchip.h
> > index 1650a5087450..18f37820b35b 100644
> > --- a/drivers/pci/controller/pcie-rockchip.h
> > +++ b/drivers/pci/controller/pcie-rockchip.h
> > @@ -300,6 +300,8 @@ struct rockchip_pcie {
> > phys_addr_t msg_bus_addr;
> > bool is_rc;
> > struct resource *mem_res;
> > + /* bus scan delay for crash causing devices' workaround */
> > + u32 bus_scan_delay_ms;
> > };
> >
> > static u32 rockchip_pcie_read(struct rockchip_pcie *rockchip, u32 reg)
> > --
> > 2.29.2
> >
More information about the Linux-rockchip
mailing list