ath10k: freeze after disconnection on killer1525

Ben Greear greearb at candelatech.com
Tue May 12 12:37:10 PDT 2015


On 05/11/2015 09:52 PM, Michal Kazior wrote:
> On 11 May 2015 at 23:17, Ben Greear <greearb at candelatech.com> wrote:
>> On 05/11/2015 06:30 AM, Michal Kazior wrote:
>>> On 11 May 2015 at 14:50, Gabriele Martino <g.martino at gmx.com> wrote:
>>>> Hi,
>>>> I'm using a Killer 1525 with hw2.1 firmware, and sometimes it stop working.
>>>> I can get it working again disconnecting and reconnecting, but sometimes
>>>> on disconnection it freezes for a long time:
>>>>
>>>> [ 2740.035190] dmar: DRHD: handling fault status reg 2
>>>> [ 2740.035195] dmar: DMAR:[DMA Read] Request device [03:00.0] fault addr
>>>> ffbeb000
>>>>                 DMAR:[fault reason 06] PTE Read access is not
>>>> set
>>>
>>> This looks like DMA tx pool memory address. I suspect
>>> firmware/hardware tried to access memory which was already unmapped by
>>> ath10k.
>>>
>>> If you're feeling lucky you could disable IOMMU - this should prevent
>>> from crashing and disconnecting. However this is hardly a solution
>>> unless you're okay with the device reading random memory and doing
>>> *stuff* with it (plaintext password from RAM sent on the air, anyone?
>>> :-)
>>
>> I don't actually see a firmware crash here.  This looks a bit like the problem
>> I hit where the WMI transport basically hangs, but the firmware does not actually
>> crash.  (I don't remember seeing any DMAR issues in my case, not sure if
>> that is significant or not.)
>
> Firmware won't necessarily crash. I guess it depends on IOMMU
> controller whether the device will actually crash per se and qca6174
> is a little more forgiving against faulted host memory access. qca988x
> tends to just crash outright if it gets a DMAR fault.

So the FW is just wedged in this case, and will not crash nor actually
handle commands properly?  That sounds like the worst of any possible
combination!

I guess one would need to hack ath10k to detect the repeated WMI timeouts and then
attempt to restart the NIC?

>> I added some keep-alive messages, busy polling, and firmware watchdog logic
>> to my kernel and firmware that seem to have effectively worked around
>> this problem.
>>
>> My kernels also have work-arounds for the hangs (FW watchdog will kill truly hung
>> firmware in about 5 seconds and then system should recover normally).
>>
>> Gabriele:  If you want to try my 3.17 kernel and CT firmware I'm curious to
>> see logs if you see similar problems.
>
> He's using qca6174, not qca988x. Your firmware does not apply in this case.

Ahh, my bad.  Thanks for clarifying.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com




More information about the ath10k mailing list