ath11k: QCA6390 on Dell XPS 13 and kernel crashes

wi nk wink at technolu.st
Wed Dec 9 10:50:14 EST 2020


On Wed, Dec 9, 2020 at 4:39 PM wi nk <wink at technolu.st> wrote:
>
> On Wed, Dec 9, 2020 at 4:35 PM Kalle Valo <kvalo at codeaurora.org> wrote:
> >
> > wi nk <wink at technolu.st> writes:
> >
> > > So I've managed to stabilise my system now, so either the race is
> > > gone, or I've done something to win it all the time.  So one of the
> > > avenues of racing I was chasing at first was in the ath11k driver
> > > itself.  There are a couple areas where the single/shared IRQ is being
> > > forcibly toggled in ways that the documentation says are not great
> > > (and the original patch was trying to avoid).  Fixing those didn't
> > > seem to have much impact on the stability of things (I've included
> > > those changes in my patch though).  After the last email I was
> > > thinking about the MHI side of things a bit more and found a number of
> > > call sites that my naive grepping had missed that do the same thing,
> > > but via acquiring a lock at the same time.  I modified all the calls
> > > to *_lock_irq and *_unlock_irq to the lock/unlock - save/restore
> > > variants that accept the flags parameter to capture state.  I've now
> > > booted and loaded the driver 10+ times without a single freeze or
> > > crash.  I'm not sure all of those modifications are necessary (ie:
> > > which things are re-entrant in this single interrupt operating mode vs
> > > which ones can use the simpler lock/unlock mechanisms), so I could use
> > > some advice/guidance there.
> > >
> > > Mitchell - if you want to grab this patch and try it, let me know how
> > > it goes and I can clean it up for the mailing list:
> > > https://github.com/w1nk/ath11k-debug/blob/master/one-irq-manage.patch
> > > (apply to ath11k-qca6390-bringup-202011301608)
> >
> > Wink, I want to ask more about your the very interesting
> > one-irq-manage.patch you wrote. Have you seen the "sched: RT throttling
> > activated" crash with that patch? If yes, how many times, for example 5
> > out of 10 times or something like that?
> >
> > Or is it so with one-irq-manage.patch the kernel doesn't crash at all? I
> > didn't quite understand the situation.
> >
> > --
> > https://patchwork.kernel.org/project/linux-wireless/list/
> >
> > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
>
> Kalle,
>
>    Sorry for moving the thread :).  So I've attempted 2 patches that
> seem to produce varying degrees of success.  The single IRQ patch took
> the crashing behaviour from hard locking immediately, to that
> stuttering / RT throttling message consistently.  So instead of hard
> locking 9/10 times and stuttering 1/10, it was inverted.
>
> The second patch disabling the m2 transition (even without the single
> IRQ patch) seems to have resolved the issues altogether, but at the
> expense of disabling this m2 state, which I don't have much idea of
> the consequences..

Sorry one more point of clarification, after the first patch I made, I
was able to always bring the adapter up while not on the charger.  I
didn't test that mode on an unmodified bringup branch.  I suspect it
would eventually crash though and I just modified some of the racing
parameters, I can confirm that if it'd be useful information for you.
It seems the key is that sometimes, something is causing this M2 state
transition (so in my original observation, plugging in the charger)
and that in the majority of the time, that state transition causes the
EE to become invalid and then everything goes sideways.  It does seem
like the adapter can successfully survive the transition occasionally,
just not often.  Preventing the transition entirely seems to keep the
race from ever occurring, but doesn't solve it really.



More information about the ath11k mailing list