ath11k: crashes with 1 MSI vector, workaround disable MHI M2 state

wi nk wink at technolu.st
Sat Dec 19 16:34:23 EST 2020


On Thu, Dec 17, 2020 at 10:53 AM Manivannan Sadhasivam
<manivannan.sadhasivam at linaro.org> wrote:
>
> Hi Kalle,
>
> On Wed, Dec 16, 2020 at 10:47:18AM +0200, Kalle Valo wrote:
> > Hi MHI devs,
> >
>
> [...]
>
> > After extensive debugging from wink he found out that disabling M2 state
> > makes the all problems go away:
> >
> > --- a/drivers/bus/mhi/core/pm.c
> > +++ b/drivers/bus/mhi/core/pm.c
> > @@ -55,12 +55,12 @@ static struct mhi_pm_transitions const dev_state_transitions[] = {
> >         },
> >         {
> >                 MHI_PM_M0,
> > -               MHI_PM_M0 | MHI_PM_M2 | MHI_PM_M3_ENTER |
> > +               MHI_PM_M0 | MHI_PM_M3_ENTER |
> >                 MHI_PM_SYS_ERR_DETECT | MHI_PM_SHUTDOWN_PROCESS |
> >                 MHI_PM_LD_ERR_FATAL_DETECT | MHI_PM_FW_DL_ERR
> >         },
> >         {
> > -               MHI_PM_M2,
> > +               MHI_PM_M0,
> >                 MHI_PM_M0 | MHI_PM_SYS_ERR_DETECT | MHI_PM_SHUTDOWN_PROCESS |
> >                 MHI_PM_LD_ERR_FATAL_DETECT
> >         },
> >
> > And indeed now we have numerous people reporting that with this
> > workaround ath11k is stable on their Dell XPS 13 9310 laptops. What on
> > earth could cause these kernel crashes/interrupt storms? And why is it
> > visible only on Dell laptops? Why does disabling M2 state fix it?
> >
>
> This is related to the ASPM state of the PCIe bus. In the meantime, I'd
> suggest to turn off ASPM using "pcie_aspm=off" in the kernel command
> line so that the MHI bus stays in M0.
>
> For debugging this issue, can someone enable debug logs for MHI and share
> the dmesg output (with ASPM enabled ofc)?
>
> Thanks,
> Mani

Hi Mani,

  Thanks for the information and ideas.  I tried to disable ASPM with
the kernel parameter you mentioned, that didn't seem to work, so I
removed ASPM support from my kernel altogether.  I still see the
adapter in the M1 state, which with my patch would've gone to M2 had
it not been disabled.  Is ASPM the only thing that will trigger the M*
transitions?  Would it require a transition to M2 regardless of
settings (maybe that's why it tried)?  The MHI dmesg output is pretty
consistent when it fails, it looks like this:
https://i.imgur.com/0XExack.jpg .  You can also see it in the mp4's
I've placed here:
https://drive.google.com/drive/folders/1wvxZI5XtwPSrm0-6-Ov50cUfqBXSXeNz?usp=sharing
.  Also note that the failure isn't deterministic, sometimes the
transition to M2 will succeed and everything works.

Thanks!



More information about the ath11k mailing list