ath11k: QCA6390 on Dell XPS 13 and kernel crashes

wi nk wink at technolu.st
Wed Dec 9 10:55:50 EST 2020


On Wed, Dec 9, 2020 at 4:50 PM Kalle Valo <kvalo at codeaurora.org> wrote:
>
> wi nk <wink at technolu.st> writes:
>
> > On Wed, Dec 9, 2020 at 4:35 PM Kalle Valo <kvalo at codeaurora.org> wrote:
> >>
> >> wi nk <wink at technolu.st> writes:
> >>
> >> > So I've managed to stabilise my system now, so either the race is
> >> > gone, or I've done something to win it all the time.  So one of the
> >> > avenues of racing I was chasing at first was in the ath11k driver
> >> > itself.  There are a couple areas where the single/shared IRQ is being
> >> > forcibly toggled in ways that the documentation says are not great
> >> > (and the original patch was trying to avoid).  Fixing those didn't
> >> > seem to have much impact on the stability of things (I've included
> >> > those changes in my patch though).  After the last email I was
> >> > thinking about the MHI side of things a bit more and found a number of
> >> > call sites that my naive grepping had missed that do the same thing,
> >> > but via acquiring a lock at the same time.  I modified all the calls
> >> > to *_lock_irq and *_unlock_irq to the lock/unlock - save/restore
> >> > variants that accept the flags parameter to capture state.  I've now
> >> > booted and loaded the driver 10+ times without a single freeze or
> >> > crash.  I'm not sure all of those modifications are necessary (ie:
> >> > which things are re-entrant in this single interrupt operating mode vs
> >> > which ones can use the simpler lock/unlock mechanisms), so I could use
> >> > some advice/guidance there.
> >> >
> >> > Mitchell - if you want to grab this patch and try it, let me know how
> >> > it goes and I can clean it up for the mailing list:
> >> > https://github.com/w1nk/ath11k-debug/blob/master/one-irq-manage.patch
> >> > (apply to ath11k-qca6390-bringup-202011301608)
> >>
> >> Wink, I want to ask more about your the very interesting
> >> one-irq-manage.patch you wrote. Have you seen the "sched: RT throttling
> >> activated" crash with that patch? If yes, how many times, for example 5
> >> out of 10 times or something like that?
> >>
> >> Or is it so with one-irq-manage.patch the kernel doesn't crash at all? I
> >> didn't quite understand the situation.
> >>
> >> --
> >> https://patchwork.kernel.org/project/linux-wireless/list/
> >>
> >> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> >
> > Kalle,
> >
> >    Sorry for moving the thread :).
>
> No problem, I'll just make extra questions to make sure that I'm
> understanding things correctly :)
>
> > So I've attempted 2 patches that seem to produce varying degrees of
> > success. The single IRQ patch took the crashing behaviour from hard
> > locking immediately, to that stuttering / RT throttling message
> > consistently. So instead of hard locking 9/10 times and stuttering
> > 1/10, it was inverted.
>
> Ok, got it now.
>
> > The second patch disabling the m2 transition (even without the single
> > IRQ patch) seems to have resolved the issues altogether, but at the
> > expense of disabling this m2 state, which I don't have much idea of
> > the consequences..
>
> Sorry, I have missed that. What second patch are you talking about?
>
> Also can you share your /proc/interrupts in full?
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
>
> --
> ath11k mailing list
> ath11k at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath11k

Here's interrupts in full , and the short patch after:

            CPU0       CPU1       CPU2       CPU3       CPU4
CPU5       CPU6       CPU7
   0:          7          0          0          0          0
0          0          0   IO-APIC    2-edge      timer
   1:          0          0          0          0          0
0          0       2923   IO-APIC    1-edge      i8042
   8:          0          0          0          0          0
0          0          0   IO-APIC    8-edge      rtc0
   9:          0       9290          0          0          0
0          0          0   IO-APIC    9-fasteoi   acpi
  12:          0          0          0          0          0
0         53          0   IO-APIC   12-edge      i8042
  14:          0      29816          0          0          0
0          0          0   IO-APIC   14-fasteoi   INT34C5:00
  16:          0          0          0          0          0
10376          0          0   IO-APIC   16-fasteoi   intel_ish_ipc,
i801_smbus, idma64.4
  27:          0          0          0          0          0
0          0          0   IO-APIC   27-fasteoi   idma64.0,
i2c_designware.0
  31:          0          0          0          0          0
0          0          0   IO-APIC   31-fasteoi   idma64.2,
i2c_designware.2
  32:          0          0          0          0          0
0          0          0   IO-APIC   32-fasteoi   idma64.3,
i2c_designware.3
  40:       9681     777197      27906          0          0
0          0          0   IO-APIC   40-fasteoi   idma64.1,
i2c_designware.1
 120:          0          0          0          0          0
0          0          0   PCI-MSI 114688-edge      PCIe PME, pciehp
 121:          0          0          0          0          0
0          0          0   PCI-MSI 118784-edge      PCIe PME, pciehp
 122:          0          0          0          0          0
0          0          0   PCI-MSI 458752-edge      PCIe PME
 123:          0          0          0          0          0
0          0          0   PCI-MSI 475136-edge      PCIe PME
 124:          0          0          1          0          0
0          0          0   PCI-MSI 229376-edge      vmd
 125:          0          0          0         27          0
0          0          0   PCI-MSI 229377-edge      vmd
 126:          0          0          0          0       4303
0          0          0   PCI-MSI 229378-edge      vmd
 127:          0          0          0          0          0
2992          0        434   PCI-MSI 229379-edge      vmd
 128:          0          0          0          0          0
593       2504          0   PCI-MSI 229380-edge      vmd
 129:          0          0          0          0        699
0       1061       1873   PCI-MSI 229381-edge      vmd
 130:       2382        394          0        603          0
0          0          0   PCI-MSI 229382-edge      vmd
 131:          0       1670          0        406        646
0          0          0   PCI-MSI 229383-edge      vmd
 132:        692          0       2903          0          0
0          0          0   PCI-MSI 229384-edge      vmd
 133:          0        518        913       2198          0
0          0          0   PCI-MSI 229385-edge      vmd
 134:          0          0          0          0          0
0          0          0   PCI-MSI 229386-edge      vmd
 135:          0          0          0          0          0
0          0          0   PCI-MSI 229387-edge      vmd
 136:          0          0          0          0          0
0          0          0   PCI-MSI 229388-edge      vmd
 137:          0          0          0          0          0
0          0          0   PCI-MSI 229389-edge      vmd
 138:          0          0          0          0          0
0          0          0   PCI-MSI 229390-edge      vmd
 139:          0          0          0          0          0
0          0          0   PCI-MSI 229391-edge      vmd
 140:          0          0          0          0          0
0          0          0   PCI-MSI 229392-edge      vmd
 141:          0          0          0          0          0
0          0          0   PCI-MSI 229393-edge      vmd
 142:          0          0          0          0          0
0          0          0   PCI-MSI 229394-edge      vmd
 143:          0          0          0          0          0
0          0          0   VMD-MSI  124  PCIe PME, aerdrv, pcie-dpc
 144:          0          0          0          0          0
0          1          0   PCI-MSI 212992-edge      xhci_hcd
 145:          0          0          0          0          0
0          0         72   PCI-MSI 327680-edge      xhci_hcd
 146:          6          0          0          0          0
0          0          0   PCI-MSI 45088768-edge      rtsx_pci
 147:          0          0          0          0          0
0          0          0   VMD-MSI  125  nvme0q0
 148:          0          0          0       1859          0
0          0      38399   PCI-MSI 32768-edge      i915
 149:          0          0          0          0          0
0          0          0   VMD-MSI  126  nvme0q1
 150:          0          0          0          0          0
0          0          0   VMD-MSI  127  nvme0q2
 151:          0          0          0          0          0
0          0          0   VMD-MSI  128  nvme0q3
 152:          0          0          0          0          0
0          0          0   VMD-MSI  129  nvme0q4
 153:          0          0          0          0          0
0          0          0   VMD-MSI  130  nvme0q5
 154:          0          0          0          0          0
0          0          0   VMD-MSI  131  nvme0q6
 155:          0          0          0          0          0
0          0          0   VMD-MSI  132  nvme0q7
 156:          0          0          0          0          0
0          0          0   VMD-MSI  133  nvme0q8
 157:          0      29816          0          0          0
0          0          0  INT34C5:00  327  DLL0945:00
 158:          0          0          0          0          0
0         48          0   PCI-MSI 360448-edge      mei_me
 159:          0          0          0          0          0
0          0       1134   PCI-MSI 514048-edge      AudioDSP
 162:          0          0          0     108102          0
0          0          0   PCI-MSI 44564480-edge      ce0, ce1, ce2,
ce3, ce5, ce7, ce8, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ,
DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ,
DP_EXT_IRQ, bhi, mhi, mhi
 NMI:          0          0          0          0          0
0          0          0   Non-maskable interrupts
 LOC:      64516      80387      54151      82574      64663
113373      58033      81555   Local timer interrupts
 SPU:          0          0          0          0          0
0          0          0   Spurious interrupts
 PMI:          0          0          0          0          0
0          0          0   Performance monitoring interrupts
 IWI:          5          2          1        760          1
1          0      16078   IRQ work interrupts
 RTR:          6          0          0          0          0
0          0          0   APIC ICR read retries
 RES:       1834       7304       1432       1807       3015
1552       1417       1498   Rescheduling interrupts
 CAL:      21739      26798      28934      22211      22590
28622      22541      20023   Function call interrupts
 TLB:      51267      49182      59392      48384      46755
56491      48103      46560   TLB shootdowns
 TRM:          2          2          2          2          2
2          2          2   Thermal event interrupts
 THR:          0          0          0          0          0
0          0          0   Threshold APIC interrupts
 DFR:          0          0          0          0          0
0          0          0   Deferred Error APIC interrupts
 MCE:          0          0          0          0          0
0          0          0   Machine check exceptions
 MCP:          3          4          4          4          4
4          4          4   Machine check polls
 ERR:         16
 MIS:          0
 PIN:          0          0          0          0          0
0          0          0   Posted-interrupt notification event
 NPI:          0          0          0          0          0
0          0          0   Nested posted-interrupt event
 PIW:          0          0          0          0          0
0          0          0   Posted-interrupt wakeup event

and the modification that disables m2 state:

diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index 3de7b1639ec6..20f670c8b129 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -55,12 +55,12 @@ static struct mhi_pm_transitions const
dev_state_transitions[] = {
     },
     {
         MHI_PM_M0,
-        MHI_PM_M0 | MHI_PM_M2 | MHI_PM_M3_ENTER |
+        MHI_PM_M0 | MHI_PM_M3_ENTER |
         MHI_PM_SYS_ERR_DETECT | MHI_PM_SHUTDOWN_PROCESS |
         MHI_PM_LD_ERR_FATAL_DETECT | MHI_PM_FW_DL_ERR
     },
     {
-        MHI_PM_M2,
+        MHI_PM_M0,
         MHI_PM_M0 | MHI_PM_SYS_ERR_DETECT | MHI_PM_SHUTDOWN_PROCESS |
         MHI_PM_LD_ERR_FATAL_DETECT
     },



More information about the ath11k mailing list