ath11k: QCA6390 on Dell XPS 13 and kernel crashes
wi nk
wink at technolu.st
Sun Dec 6 03:05:57 EST 2020
On Sat, Dec 5, 2020 at 8:17 PM wi nk <wink at technolu.st> wrote:
>
> On Tue, Dec 1, 2020 at 11:17 AM wi nk <wink at technolu.st> wrote:
> >
> > On Mon, Nov 30, 2020 at 6:02 PM wi nk <wink at technolu.st> wrote:
> > >
> > > On Mon, Nov 30, 2020 at 5:55 PM Kalle Valo <kvalo at codeaurora.org> wrote:
> > > >
> > > > Hi Wi and Thomas,
> > > >
> > > > I'll start a new thread about problems on XPS 13. The information is
> > > > scattered to different threads and hard to find everything, it's much
> > > > easier to have everything in one place. So let's continue the discussion
> > > > about the kernel crashes on this thread.
> > > >
> > > > Here's what I have understood so far:
> > > >
> > > > * On Dell XPS 15 there are no issues with QCA6390 and it seems to work
> > > > with 32 MSI vectors.
> > > >
> > > > * On Dell XPS 13 there's a BIOS bug and kernel prints:
> > > >
> > > > [ 0.050130] DMAR: [Firmware Bug]: Your BIOS is broken; DMAR reported at address 0!
> > > > BIOS vendor: Dell Inc.; Ver: 1.1.1; Product Version:
> > > >
> > > > * Because of this BIOS bug QCA6390 only gets one MSI vector on Dell XPS
> > > > 13. We added a hack to ath11k make it work with only vector and after
> > > > that it's possible to boot the firmware, connect to the AP and use the
> > > > device for a while.
> > > >
> > > > * But the problem now is that the kernel is crashing almost immediately
> > > > and almost every time(?). And these crashes only happen on Dell XPS
> > > > 13, all other systems (including Dell XPS 15) seem to work without
> > > > issues.
> > > >
> > > > Is my understanding correct? Did I miss anything?
> > > >
> > > > About the symptoms Wi reports:
> > > >
> > > > ----------------------------------------------------------------------
> > > > So up until this point, everything is working without issues.
> > > > Everything seems to spiral out of control a couple of seconds later
> > > > when my system attempts to actually bring up the adapter. In most of
> > > > the crash states I will see this:
> > > >
> > > > [ 31.286725] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> > > > [ 31.390187] wlp85s0: send auth to ec:08:6b:27:01:ea (try 2/3)
> > > > [ 31.391928] wlp85s0: authenticated
> > > > [ 31.394196] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> > > > [ 31.396513] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> > > > (capab=0x411 status=0 aid=6)
> > > > [ 31.407730] wlp85s0: associated
> > > > [ 31.434354] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready
> > > >
> > > > And then either somewhere in that pile of messages, or a second or two
> > > > after this my machine will start to stutter as I mentioned before, and
> > > > then it either hangs, or I see this message (I'm truncating the
> > > > timestamp):
> > > >
> > > > [ 35.xxxx ] sched: RT throttling activated
> > > >
> > > > After that moment, the machine is unresponsive. Sorry I can't seem to
> > > > extract this data other than screenshots from my phone at the moment,
> > > > you can see the dmesg output from 6 different hangs here:
> > > >
> > > > https://github.com/w1nk/ath11k-debug
> > > > ----------------------------------------------------------------------
> > > >
> > > > And Thomas Krause reports:
> > > >
> > > > --------------------------------------------------------------------------------
> > > > I can confirm this behavior on my configuration. I managed to login
> > > > once and select the Wifi and connect to it. It seemed curiously enough
> > > > be stable long enough to enter the Wifi passphrase. After the
> > > > connection was established, the system hang and on each attempt to
> > > > reboot into the graphical system it would freeze at some point
> > > > (sometimes even before showing the login screen).
> > > > ----------------------------------------------------------------------
> > > >
> > > > --
> > > > https://patchwork.kernel.org/project/linux-wireless/list/
> > > >
> > > > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> > >
> > > Hi Kalle,
> > >
> > > Again, thanks much for your work. I think you've summarized
> > > everything up until this point. On my XPS 13 9310 The behavior of the
> > > RT throttling still exists for me occasionally on loading the
> > > driver/associating with an AP. The throttling consistently occurs
> > > after a few sets of the MHI debug printing showing the EE entering an
> > > invalid state ( AMSS -> INVALID_EE ). I'm now building the latest tag
> > > to see if there are any differences.
> > >
> > > Thanks!
> >
> > Just to follow up, the first boot resulted in the RT throttling
> > message as the adapter was coming up/associating, shortly after the
> > firmware crashed and the kernel didn't fully freeze, but I needed to(
> > reboot to bring the adapter back.
>
> Kalle -
>
> I've noticed one additional behavior that may give someone with
> familiarity with the QCA hardware a clue. I'm running
> ath11k-qca6390-bringup-202011301608 on the dell xps 13 9310. For
> whatever reason, having the bluetooth subsystem enabled (with a paired
> device) on this dell basically guarantees I'll hit the scheduler
> throttling issue as the ath11k driver is initializing / associating.
> The bluetooth system is using the btqca driver. I don't have any
> useful debugging (I'll gladly collect some if there is a way to do it)
> other than tracking some simple statistics. I booted my system 20
> times, 10 times with bluetooth enabled ((and some headphones turned on
> ready to pair), and 10 times without. In both scenarios, I'm booting
> into X and manually modprobing the ath11k driver. The difference is
> that with bluetooth on and by the time I modprobe the driver, the
> headphones are paired and I received the throttling message and
> subsequent freezing 10/10 times. With bluetooth off / my headphones
> not paired, I only saw it 2/10. I know it's not much hard information
> but it's reliably reproducible for me, is there anything useful I can
> collect?
Well unfortunately I think the bluetooth was just a red herring in the
racing. To chase that, I disabled all bluetooth and was able to get
into a state where I had 6 failed boots in a row. To further poke
around, I rebuilt the kernel with localmodconfig to disable building
big chunks of things. This kernel is way less stable and seems to
freeze most of the time (but does occasionally remain stable), I'm not
sure what else got disabled in there, but it seems to have had a
negative impact on the crash racing.
More information about the ath11k
mailing list