Hard lockup during vif restart tests.

Ben Greear greearb at candelatech.com
Wed Sep 17 08:52:18 PDT 2014


On 09/16/2014 11:34 PM, Michal Kazior wrote:
> On 16 September 2014 20:42, Ben Greear <greearb at candelatech.com> wrote:
>> This is on a 3.14.14+ hacked kernel, with CT firmware.
>>
>> Test case is to restart stations (and the AP
>> on the other side) every 10-30 seconds.
>> After a bit, the station machine locked up hard.
>>
>> I have no idea how to trouble-shoot this better, so this is
>> just FYI.
>>
> [...]
>> ath10k: boot warm reset complete
>> ath10k: failed to power up target using warm reset: -110
>> ath10k: trying cold reset
>> ath10k: boot cold reset
>> ath10k: boot cold reset complete
>> [hang, even sysrq will not work]
> 
> There's a known problem with cold reset being capable of locking up
> entire system (depends on the pci-e controller, e.g. AP135 splats a
> Data Bus Error instead).
> 
> Actually warm reset can do the same in some corner cases: try running
> Rx traffic and just start the recovery sequence (without actually
> crashing the fw). My x86 locks up very easily with this.
> 
> I strongly suggest you use reset_mode=1 when you load ath10k_pci so
> cold reset isn't used. This may result in ath10k being unable to bring
> up the device in some rare cases (e.g. after an IOMMU fault if your
> system supports it) but I believe it's far better than having the
> whole system lock up.
> 
> My suspicion is tx/rx rings, dma transfer engines, internal irqs
> aren't stopped properly. I have a prototype patch for the warm reset
> problem but it's incomplete and I'm not sure if I can share it yet.

I will try the warm-reset-only flag, and I do hope you have success
with the warm/cold reset fixes.

But, I still wonder if we could just reset less often and maybe
make it a bit harder to hit these problems?

Why do we reset the firmware/NIC when we admin down/up the
vif (when a single vif is active)?  Couldn't we just keep
the firmware active in this state and not risk lockup due
to reset?

Thanks,
Ben


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com




More information about the ath10k mailing list