ath10k driver crashes whenever firmware crashes on ARM SoC

Kalle Valo kvalo at qca.qualcomm.com
Tue Mar 11 04:10:01 EDT 2014


Avery Pennarun <apenwarr at gmail.com> writes:

> On Tue, Mar 11, 2014 at 2:33 AM, Kalle Valo <kvalo at qca.qualcomm.com> wrote:
>
>> I showed your analysis to an HW engineer and the response I got was
>> "don't do that" (= don't use the cold reset). As you know, we now have a
>> workaround using the warm reset:
>>
>> 00f5482bcd94 ath10k: suspend hardware before reset
>> 9042e17df834 ath10k: refactor suspend/resume functions
>> fc36e3ffcdd0 ath10k: fix device initialization routine
>>
>> Have you tested these? Did they help at all?
>
> Yes, I've tested them and they help, mainly by doing the cold reset
> less often.  However, when the firmware hard crashes in certain ways
> (for example, using my original test case), it looks like warm reset
> can't fix that.  The driver then still must fall back to cold reset
> and, some fairly large percentage of the time (1/3rd?), crashes the
> bus.

Ok, thanks. I'll investigate more about the warm reset problems and try
to find ways to make it more reliable.

> We do have a separate reset line controlled by a GPIO.  Using that
> crashes the SoC's PCIe host implementation (whee!).  But I got help
> from the SoC manufacturer and was able to get some instructions for
> resetting their PCIe host controller.  When I do all the magic
> incantations in the right order, the system can recover, albeit with a
> fully reset ath10k chip.  This workaround is unfortunately specific to
> the host device platform so it won't do you much good.
>
> Of course, a good way to avoid the problem is "don't crash the
> firmware then," but that's not as robust as I'd like.

I never trust the firmware, in any device, and that's why I would like
to have in ath10k 100% reliable way to restart it from host.

> This box is doing quite a few things, so rebooting to fix a problem on
> one of the wireless cards is pretty expensive.

Yeah, that would be really bad. Restarting the firmware will take
something like 1-2 seconds and the user would only notice a small pause
in data traffic, a much better solution than rebooting the whole box.

> Nevertheless, the warm reset changes really do reduce the frequency of
> this a lot, to the point where my workaround is almost never needed.
> Thanks for that!

Great, thanks for the feedback.

-- 
Kalle Valo



More information about the ath10k mailing list