After shutdown/restart, ath10k sometimes stops receiving packets
Michal Kazior
michal.kazior at tieto.com
Fri Jun 13 02:49:23 PDT 2014
On 13 June 2014 11:37, Avery Pennarun <apenwarr at gmail.com> wrote:
> On Fri, Jun 13, 2014 at 5:14 AM, Michal Kazior <michal.kazior at tieto.com> wrote:
>> On 13 June 2014 01:37, Avery Pennarun <apenwarr at gmail.com> wrote:
>>> We are experiencing a relatively-rare-but-not-rare-enough case which
>>> has approximately these steps:
>>>
>>> - run a wifi AP for a while with a station or two connected
>>> - shut down hostapd (stations all get disconnected of course)
>>
>> Did you make sure you had no interface up and running? Did the
>> hardware go through warm/cold reset?
>
> No, I didn't see any logs about that. The firmware seemed happy.
When last interface is brought down mac80211 calls ath10k_stop() which
in turn stops and resets the device. When you bring up first interface
up mac80211 calls ath10k_start() which boots the device (and performs
reset too).
When you say "restart hostapd" it doesn't state whether the driver
went through start-stop or not.
>>> - restart hostapd (perhaps on another channel or with different settings)
>>>
>>> After that, an external wifi sniffer can see beacons being transmitted
>>> by the AP as expected, but all packets from stations trying to connect
>>> are ignored. In particular, Probe requests are not answered, and Auth
>>> requests do not even receive a wifi ACK.
>>
>> This means WMI is working (each beacon is submitted via WMI). This
>> also implies CE works.
>>
>> But if there are no ACKs this suggest HW must've been instructed to
>> ignore frames somehow.
>>
>> (..) After taking a look I think wmi_vdev_start_request_cmd isn't
>> really handled properly for 10.x firmware. I'm guessing this ABI issue
>> might be the problem.
>>
>> 10.x firmware has:
>> struct wmi_channel chan;
>> __le32 vdev_id;
>> __le32 requestor_id;
>> __le32 num_noa_descriptors;
>> __le32 disable_hw_ack;
>> struct wmi_p2p_noa_descriptor noa_descriptors[2];
>>
>> disable_hw_ack overlaps with dtim_period. Perhaps that's the problem.
>>
>> It's intriguing how this hasn't manifested itself until now..
>
> I'm not sure what you mean by overlapping...
10.x and main firmware branches have ABI differences. Some structures
differ. vdev_start is one of them but apparently this wasn't a problem
for either firmware so far.
The 10th dword (dword=32bit word) is disable_hw_ack for 10.x while for
main branch it is dtim_period.
>>> Restarting hostapd doesn't fix it. However, rmmoding and modprobing
>>> the ath10k_pci module does fix it.
>>
>> Did the hardware go through warm or cold reset?
>
> Reloading the driver, I believe it does a warm reset. I know it's not
> doing a cold reset because that tends to crash my machine :)
Just because it tends to doesn't mean it does, right? Unless you
disabled cold reset.
>>> This is with a mindspeed c2k host processor, 3.2 kernel, and modules
>>> backported by backports from kvalo's ath-next as of
>>> v3.15-rc1-237-gd9bc4b9 (roughly 2014-04-29). Firmware is
>>> 10.1.467.2-1.
>>>
>>> Has anyone else seen this? Any suggestions where to look to narrow
>>> down the problem?
>>
>> I haven't seen anything like this.
>>
>> For one reloading the driver may tickle cfg80211 and regulatory
>> updates but I just can't imagine how that could cripple ACKing.
>
> Just to clarify, reloading the driver fixes it, it doesn't cripple it.
> It's restarting hostapd that seems to cripple it.
Yeah, sorry. I meant I can't imagine how regulatory could be involved
in crippling ACKing.
>>> I can't exactly produce it on demand yet, but if someone suggests
>>> things to look for when it happens, it occurs often enough that I
>>> should be able tor run those things.
>>
>> You might want to print out the vdev_start command's dtim_period
>> (which overlaps with 10.x's disable_hw_ack) before it is sent to fw
>> and compare the value for when ACKing works and when it doesn't.
>
> Do you mean printing the dtim_period we are sending to the firmware,
> from inside vdev_start(), and watching to see if it is ever different
> than expected?
I think `printk("dtim period: %d\n", arg->dtim_period);` in
ath10k_wmi_vdev_start_restart() should be enough.
Michał
More information about the ath10k
mailing list