ath11k: crashes with 1 MSI vector, workaround disable MHI M2 state

Stephen Liang stephenliang7 at gmail.com
Thu Dec 17 14:01:37 EST 2020


In my last test run, the system hangs were only occasionally
reproducible during WiFi scanning (opening Gnome WiFi settings to see
a list of networks, or looking at the network list dropdown will
trigger the hang). If you do this, one of two things happens, usually
within a minute.

1. The system hangs
2. The firmware crashes

Please find below the debug MHI logs that were generated via echo -n
'module mhi +p' > /sys/kernel/debug/dynamic_debug/control

Firmware crash logs (no hang): https://pastebin.com/raw/E0y49evA
Lockup: https://i.imgur.com/0XExack.jpg

On Thu, Dec 17, 2020 at 1:53 AM Manivannan Sadhasivam
<manivannan.sadhasivam at linaro.org> wrote:
>
> Hi Kalle,
>
> On Wed, Dec 16, 2020 at 10:47:18AM +0200, Kalle Valo wrote:
> > Hi MHI devs,
> >
>
> [...]
>
> > After extensive debugging from wink he found out that disabling M2 state
> > makes the all problems go away:
> >
> > --- a/drivers/bus/mhi/core/pm.c
> > +++ b/drivers/bus/mhi/core/pm.c
> > @@ -55,12 +55,12 @@ static struct mhi_pm_transitions const dev_state_transitions[] = {
> >         },
> >         {
> >                 MHI_PM_M0,
> > -               MHI_PM_M0 | MHI_PM_M2 | MHI_PM_M3_ENTER |
> > +               MHI_PM_M0 | MHI_PM_M3_ENTER |
> >                 MHI_PM_SYS_ERR_DETECT | MHI_PM_SHUTDOWN_PROCESS |
> >                 MHI_PM_LD_ERR_FATAL_DETECT | MHI_PM_FW_DL_ERR
> >         },
> >         {
> > -               MHI_PM_M2,
> > +               MHI_PM_M0,
> >                 MHI_PM_M0 | MHI_PM_SYS_ERR_DETECT | MHI_PM_SHUTDOWN_PROCESS |
> >                 MHI_PM_LD_ERR_FATAL_DETECT
> >         },
> >
> > And indeed now we have numerous people reporting that with this
> > workaround ath11k is stable on their Dell XPS 13 9310 laptops. What on
> > earth could cause these kernel crashes/interrupt storms? And why is it
> > visible only on Dell laptops? Why does disabling M2 state fix it?
> >
>
> This is related to the ASPM state of the PCIe bus. In the meantime, I'd
> suggest to turn off ASPM using "pcie_aspm=off" in the kernel command
> line so that the MHI bus stays in M0.
>
> For debugging this issue, can someone enable debug logs for MHI and share
> the dmesg output (with ASPM enabled ofc)?
>
> Thanks,
> Mani



More information about the ath11k mailing list