[FS#1263] Archer C7 Periodic 2.4G WiFi Failure (ath10k_pci failed to parse phyerr tlv payload at byte 0)

LEDE Bugs lede-bugs at lists.infradead.org
Mon Jan 8 08:32:49 PST 2018


A new Flyspray task has been opened.  Details are below. 

User who did this - Ian MacDonald (imac) 

Attached to Project - LEDE Project
Summary - Archer C7 Periodic 2.4G WiFi Failure (ath10k_pci failed to parse phyerr tlv payload at byte 0)
Task Type - Bug Report
Category - Kernel
Status - Unconfirmed
Assigned To - 
Operating System - All
Severity - Medium
Priority - Very Low
Reported Version - lede-17.01
Due in Version - Undecided
Due Date - Undecided
Details - We run 17.01.4 (with latest package updates) on Archer C7 (4.4.92 mips / ath10k) 
   
We have several similar ArcherC7s that experience a periodic disconnect on 2.4G band. We are fairly certain it occurs on all of them, though about 5 of them actively see this issue where we have to deal with it.  

The result is no 2.4G devices can connect (no issue with 5G) until we execute "wifi" or reboot the ArcherC7s.

We have applied a cron job that runs each night and executes "wifi" to workaround this issue since 17.01.2 but believe it may have been present since 17.01.1. It is still present in 17.01.4 unfortunately so we have decided to commit some time to resolving. 

In one of our office locations, to monitor this bug with the hope of oneday resolving it, we do not run the cron job, and only have two 2.4G devices connected.

On this device where we have to manually intervene to work around the problem, in the last 52 days (current uptime), the 2.4G failure has happenened three times. 

Between the 1st and 2nd occurence we saw these SWBA messages in our dmesg. 

[2506537.784853] ath: phy1: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x02100020 DMADBG_7=0x00028800
[2506539.854485] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.862178] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.869849] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.877511] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.885148] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.892821] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.900499] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.908171] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.915809] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[2506539.923492] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon

Before each occurence of the 2.4G failure, at some point we see these messages noting the three different timestamps:

[290147.737347] ath10k_pci 0000:01:00.0: failed to parse phyerr tlv payload at byte 0
[463582.984526] ath10k_pci 0000:01:00.0: failed to parse phyerr tlv payload at byte 0

[2794383.390501] ath10k_pci 0000:01:00.0: failed to parse phyerr tlv payload at byte 0
[2794532.486186] ath10k_pci 0000:01:00.0: failed to parse phyerr tlv payload at byte 0

[4541364.695287] ath10k_pci 0000:01:00.0: failed to parse phyerr tlv payload at byte 0
[4541500.990587] ath10k_pci 0000:01:00.0: failed to parse phyerr tlv payload at byte 0

The only other message that jumps out in the dmesg is the one below, but like thw SWBA, it is not associated with each occurence. 

[3311103.068324] ath: phy1: Unable to reset channel, reset status -5

The 2.4G clients are a D-LINK DCH-S150 motion detector (70:62:b8:93:98:b8) and a Google Chromecast (6C:AD:F8:4B:A3:52)

There is not much else in the dmesg (which I have attached), other then the bridge changes that occur when we execute "wifi" to resolve the issue which help identify which messages occur before resolving the issue each time (which can be days later as we only really notice when Chromecast is offline and we are trying to use it). 

Nothing jumps out of our logread (and has not in the past), but typically the buffer has wrapped (as was the case today) and provided no messages related to the 2.4G clients. 

Our 2.4G config is below, noting we do have a 40Mhz wide config enabled leveraging ht_coex and htmode. 

config wifi-device 'radio1'
	option type 'mac80211'
	option hwmode '11g'
	option path 'platform/qca955x_wmac'
	option channel '11'
	option htmode 'HT40-'
	option txpower '24'
	option ht_coex '0'
	option noscan '1'
	option country 'US'

config wifi-iface
	option device 'radio1'
	option network 'lan'
	option mode 'ap'
	option ssid 'xxxx'
	option key 'xxxx'
	option encryption 'psk2+ccmp'


One or more files have been attached.

More information can be found at the following URL:
https://bugs.lede-project.org/index.php?do=details&task_id=1263



More information about the lede-bugs mailing list