ath11k: QCA6390 on Dell XPS 13 and kernel crashes
wi nk
wink at technolu.st
Wed Dec 9 10:55:50 EST 2020
On Wed, Dec 9, 2020 at 4:50 PM Kalle Valo <kvalo at codeaurora.org> wrote:
>
> wi nk <wink at technolu.st> writes:
>
> > On Wed, Dec 9, 2020 at 4:35 PM Kalle Valo <kvalo at codeaurora.org> wrote:
> >>
> >> wi nk <wink at technolu.st> writes:
> >>
> >> > So I've managed to stabilise my system now, so either the race is
> >> > gone, or I've done something to win it all the time. So one of the
> >> > avenues of racing I was chasing at first was in the ath11k driver
> >> > itself. There are a couple areas where the single/shared IRQ is being
> >> > forcibly toggled in ways that the documentation says are not great
> >> > (and the original patch was trying to avoid). Fixing those didn't
> >> > seem to have much impact on the stability of things (I've included
> >> > those changes in my patch though). After the last email I was
> >> > thinking about the MHI side of things a bit more and found a number of
> >> > call sites that my naive grepping had missed that do the same thing,
> >> > but via acquiring a lock at the same time. I modified all the calls
> >> > to *_lock_irq and *_unlock_irq to the lock/unlock - save/restore
> >> > variants that accept the flags parameter to capture state. I've now
> >> > booted and loaded the driver 10+ times without a single freeze or
> >> > crash. I'm not sure all of those modifications are necessary (ie:
> >> > which things are re-entrant in this single interrupt operating mode vs
> >> > which ones can use the simpler lock/unlock mechanisms), so I could use
> >> > some advice/guidance there.
> >> >
> >> > Mitchell - if you want to grab this patch and try it, let me know how
> >> > it goes and I can clean it up for the mailing list:
> >> > https://github.com/w1nk/ath11k-debug/blob/master/one-irq-manage.patch
> >> > (apply to ath11k-qca6390-bringup-202011301608)
> >>
> >> Wink, I want to ask more about your the very interesting
> >> one-irq-manage.patch you wrote. Have you seen the "sched: RT throttling
> >> activated" crash with that patch? If yes, how many times, for example 5
> >> out of 10 times or something like that?
> >>
> >> Or is it so with one-irq-manage.patch the kernel doesn't crash at all? I
> >> didn't quite understand the situation.
> >>
> >> --
> >> https://patchwork.kernel.org/project/linux-wireless/list/
> >>
> >> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> >
> > Kalle,
> >
> > Sorry for moving the thread :).
>
> No problem, I'll just make extra questions to make sure that I'm
> understanding things correctly :)
>
> > So I've attempted 2 patches that seem to produce varying degrees of
> > success. The single IRQ patch took the crashing behaviour from hard
> > locking immediately, to that stuttering / RT throttling message
> > consistently. So instead of hard locking 9/10 times and stuttering
> > 1/10, it was inverted.
>
> Ok, got it now.
>
> > The second patch disabling the m2 transition (even without the single
> > IRQ patch) seems to have resolved the issues altogether, but at the
> > expense of disabling this m2 state, which I don't have much idea of
> > the consequences..
>
> Sorry, I have missed that. What second patch are you talking about?
>
> Also can you share your /proc/interrupts in full?
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
>
> --
> ath11k mailing list
> ath11k at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath11k
Here's interrupts in full , and the short patch after:
CPU0 CPU1 CPU2 CPU3 CPU4
CPU5 CPU6 CPU7
0: 7 0 0 0 0
0 0 0 IO-APIC 2-edge timer
1: 0 0 0 0 0
0 0 2923 IO-APIC 1-edge i8042
8: 0 0 0 0 0
0 0 0 IO-APIC 8-edge rtc0
9: 0 9290 0 0 0
0 0 0 IO-APIC 9-fasteoi acpi
12: 0 0 0 0 0
0 53 0 IO-APIC 12-edge i8042
14: 0 29816 0 0 0
0 0 0 IO-APIC 14-fasteoi INT34C5:00
16: 0 0 0 0 0
10376 0 0 IO-APIC 16-fasteoi intel_ish_ipc,
i801_smbus, idma64.4
27: 0 0 0 0 0
0 0 0 IO-APIC 27-fasteoi idma64.0,
i2c_designware.0
31: 0 0 0 0 0
0 0 0 IO-APIC 31-fasteoi idma64.2,
i2c_designware.2
32: 0 0 0 0 0
0 0 0 IO-APIC 32-fasteoi idma64.3,
i2c_designware.3
40: 9681 777197 27906 0 0
0 0 0 IO-APIC 40-fasteoi idma64.1,
i2c_designware.1
120: 0 0 0 0 0
0 0 0 PCI-MSI 114688-edge PCIe PME, pciehp
121: 0 0 0 0 0
0 0 0 PCI-MSI 118784-edge PCIe PME, pciehp
122: 0 0 0 0 0
0 0 0 PCI-MSI 458752-edge PCIe PME
123: 0 0 0 0 0
0 0 0 PCI-MSI 475136-edge PCIe PME
124: 0 0 1 0 0
0 0 0 PCI-MSI 229376-edge vmd
125: 0 0 0 27 0
0 0 0 PCI-MSI 229377-edge vmd
126: 0 0 0 0 4303
0 0 0 PCI-MSI 229378-edge vmd
127: 0 0 0 0 0
2992 0 434 PCI-MSI 229379-edge vmd
128: 0 0 0 0 0
593 2504 0 PCI-MSI 229380-edge vmd
129: 0 0 0 0 699
0 1061 1873 PCI-MSI 229381-edge vmd
130: 2382 394 0 603 0
0 0 0 PCI-MSI 229382-edge vmd
131: 0 1670 0 406 646
0 0 0 PCI-MSI 229383-edge vmd
132: 692 0 2903 0 0
0 0 0 PCI-MSI 229384-edge vmd
133: 0 518 913 2198 0
0 0 0 PCI-MSI 229385-edge vmd
134: 0 0 0 0 0
0 0 0 PCI-MSI 229386-edge vmd
135: 0 0 0 0 0
0 0 0 PCI-MSI 229387-edge vmd
136: 0 0 0 0 0
0 0 0 PCI-MSI 229388-edge vmd
137: 0 0 0 0 0
0 0 0 PCI-MSI 229389-edge vmd
138: 0 0 0 0 0
0 0 0 PCI-MSI 229390-edge vmd
139: 0 0 0 0 0
0 0 0 PCI-MSI 229391-edge vmd
140: 0 0 0 0 0
0 0 0 PCI-MSI 229392-edge vmd
141: 0 0 0 0 0
0 0 0 PCI-MSI 229393-edge vmd
142: 0 0 0 0 0
0 0 0 PCI-MSI 229394-edge vmd
143: 0 0 0 0 0
0 0 0 VMD-MSI 124 PCIe PME, aerdrv, pcie-dpc
144: 0 0 0 0 0
0 1 0 PCI-MSI 212992-edge xhci_hcd
145: 0 0 0 0 0
0 0 72 PCI-MSI 327680-edge xhci_hcd
146: 6 0 0 0 0
0 0 0 PCI-MSI 45088768-edge rtsx_pci
147: 0 0 0 0 0
0 0 0 VMD-MSI 125 nvme0q0
148: 0 0 0 1859 0
0 0 38399 PCI-MSI 32768-edge i915
149: 0 0 0 0 0
0 0 0 VMD-MSI 126 nvme0q1
150: 0 0 0 0 0
0 0 0 VMD-MSI 127 nvme0q2
151: 0 0 0 0 0
0 0 0 VMD-MSI 128 nvme0q3
152: 0 0 0 0 0
0 0 0 VMD-MSI 129 nvme0q4
153: 0 0 0 0 0
0 0 0 VMD-MSI 130 nvme0q5
154: 0 0 0 0 0
0 0 0 VMD-MSI 131 nvme0q6
155: 0 0 0 0 0
0 0 0 VMD-MSI 132 nvme0q7
156: 0 0 0 0 0
0 0 0 VMD-MSI 133 nvme0q8
157: 0 29816 0 0 0
0 0 0 INT34C5:00 327 DLL0945:00
158: 0 0 0 0 0
0 48 0 PCI-MSI 360448-edge mei_me
159: 0 0 0 0 0
0 0 1134 PCI-MSI 514048-edge AudioDSP
162: 0 0 0 108102 0
0 0 0 PCI-MSI 44564480-edge ce0, ce1, ce2,
ce3, ce5, ce7, ce8, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ,
DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ,
DP_EXT_IRQ, bhi, mhi, mhi
NMI: 0 0 0 0 0
0 0 0 Non-maskable interrupts
LOC: 64516 80387 54151 82574 64663
113373 58033 81555 Local timer interrupts
SPU: 0 0 0 0 0
0 0 0 Spurious interrupts
PMI: 0 0 0 0 0
0 0 0 Performance monitoring interrupts
IWI: 5 2 1 760 1
1 0 16078 IRQ work interrupts
RTR: 6 0 0 0 0
0 0 0 APIC ICR read retries
RES: 1834 7304 1432 1807 3015
1552 1417 1498 Rescheduling interrupts
CAL: 21739 26798 28934 22211 22590
28622 22541 20023 Function call interrupts
TLB: 51267 49182 59392 48384 46755
56491 48103 46560 TLB shootdowns
TRM: 2 2 2 2 2
2 2 2 Thermal event interrupts
THR: 0 0 0 0 0
0 0 0 Threshold APIC interrupts
DFR: 0 0 0 0 0
0 0 0 Deferred Error APIC interrupts
MCE: 0 0 0 0 0
0 0 0 Machine check exceptions
MCP: 3 4 4 4 4
4 4 4 Machine check polls
ERR: 16
MIS: 0
PIN: 0 0 0 0 0
0 0 0 Posted-interrupt notification event
NPI: 0 0 0 0 0
0 0 0 Nested posted-interrupt event
PIW: 0 0 0 0 0
0 0 0 Posted-interrupt wakeup event
and the modification that disables m2 state:
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index 3de7b1639ec6..20f670c8b129 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -55,12 +55,12 @@ static struct mhi_pm_transitions const
dev_state_transitions[] = {
},
{
MHI_PM_M0,
- MHI_PM_M0 | MHI_PM_M2 | MHI_PM_M3_ENTER |
+ MHI_PM_M0 | MHI_PM_M3_ENTER |
MHI_PM_SYS_ERR_DETECT | MHI_PM_SHUTDOWN_PROCESS |
MHI_PM_LD_ERR_FATAL_DETECT | MHI_PM_FW_DL_ERR
},
{
- MHI_PM_M2,
+ MHI_PM_M0,
MHI_PM_M0 | MHI_PM_SYS_ERR_DETECT | MHI_PM_SHUTDOWN_PROCESS |
MHI_PM_LD_ERR_FATAL_DETECT
},
More information about the ath11k
mailing list