[LEDE-DEV] Transmit timeouts with mtk_eth_soc and MT7621

Kristian Evensen kristian.evensen at gmail.com
Fri Jul 21 04:04:26 PDT 2017


Hello,

I am currently facing an issue on some MT7621 (ZBT WG2626/2926 and
3526) based devices that I am, after days of fruitless digging,
seemingly unable to solve myself and I have run out of places to look
or things to test. Most devices are running LEDE 17.01, while some are
running the latest snapshot.

In some networks where I have placed the routers, I get frequent
reports about connection loss or interruptions. Looking at the logs on
the problem-routers, I see the following message:

[10203959.920000] ------------[ cut here ]------------
[10203959.920000] WARNING: CPU: 3 PID: 0 at
net/sched/sch_generic.c:306 dev_watchdog+0x258/0x2fc()
[10203959.930000] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue
0 timed out
[10203959.940000] Modules linked in: rtl8192cu qcserial ppp_async
option iptable_nat usb_wwan rtl_usb rtl8192c_common rt2800usb
rt2800pci rt2800mmio rt2800lib rndis_host qmi_wwan ppp_generic
nf_nat_pptp nf_nat_ipv4 nf_
nat_amanda nf_conntrack_pptp nf_conntrack_ipv6 nf_conntrack_ipv4
nf_conntrack_amanda ipt_REJECT ipt_MASQUERADE huawei_cdc_ncm cdc_ncm
cdc_ether ax88179_178a ath9k_htc ath9k_common asix xt_time xt_tcpudp
xt_tcpmss xt_st
ring xt_statistic xt_state xt_recent xt_quota xt_pkttype xt_physdev
xt_owner xt_nfacct xt_nat xt_multiport xt_mark xt_mac xt_limit
xt_length xt_id xt_hl xt_helper xt_ecn xt_dscp xt_conntrack
xt_connmark xt_connlimit xt
_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_NFQUEUE
xt_LOG xt_HL xt_DSCP xt_CT xt_CLASSIFY usbserial usbnet ts_kmp ts_fsm
ts_bm slhc rtlwifi rt2x00usb rt2x00pci rt2x00mmio rt2x00lib
nfnetlink_queue nfnet
link_acct nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_nat_sip
nf_nat_redirect nf_nat_proto_gre nf_nat_masquerade_ipv4 nf_nat_irc
nf_nat_h323 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6
nf_defrag_ipv4 nf_conntrack_
tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack_rtcache
nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_irc
nf_conntrack_h323 nf_conntrack_ftp nf_conntrack_broadcast
iptable_rawpost iptable_raw iptable_ma
ngle iptable_filter ipt_ECN ipheth ip_tables ip6table_rawpost
crc_itu_t crc_ccitt compat_xtables cdc_wdm cdc_acm br_netfilter
ath9k_hw ath act_connmark nf_conntrack act_skbedit act_mirred em_u32
cls_u32 cls_tcindex cls
_flow cls_route cls_fw sch_hfsc sch_ingress mt7603e mt76x2e mt76
mac80211 cfg80211 compat ledtrig_usbdev xt_set ip_set_list_set
ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet
ip_set_hash_net ip_set_hash_ne
tportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip
ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip
ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set
nfnetlink ip6t_REJECT nf_reject_ip
v6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle
ip6table_filter ip6_tables x_tables ifb tun eeprom_93cx6 leds_gpio
xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd uhci_hcd ohci_platform
ohci_hcd ehci_platform ehci_h
cd sd_mod scsi_mod gpio_button_hotplug usbcore nls_base usb_common mii
cryptomgr aead crypto_null crypto_hash
[10203960.140000] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.15 #0
[10203960.140000] Stack : 00000000 00000000 804a6862 00000033 00000000
00000000 80450000 804c0000
[10203960.140000]  8fc43f10 80451c63 803d0cb0 00000003 00000000
804a367c ffffffff 00000200
[10203960.140000]  00100000 800639a4 00000000 00000001 80450000
804c0000 803d5590 8fc65c0c
[10203960.140000]  803d7784 800616f0 802b1444 000000e0 ffffffff
00000000 8fc65c0c 8fc43c28
[10203960.140000]  00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[10203960.140000]  ...
[10203960.180000] Call Trace:
[10203960.180000] [<800165f4>] show_stack+0x50/0x84
[10203960.180000] [<801adabc>] dump_stack+0x84/0xbc
[10203960.190000] [<8002bea0>] warn_slowpath_common+0xa0/0xd0
[10203960.190000] [<8002befc>] warn_slowpath_fmt+0x2c/0x38
[10203960.200000] [<802b1444>] dev_watchdog+0x258/0x2fc
[10203960.200000] [<80072904>] call_timer_fn.isra.4+0x24/0x80
[10203960.210000] [<80072b5c>] run_timer_softirq+0x1fc/0x25c
[10203960.220000] [<8002ea2c>] __do_softirq+0x294/0x2e0
[10203960.220000] [<8002ecf4>] irq_exit+0x68/0x84
[10203960.220000] [<801d77e8>] plat_irq_dispatch+0xb4/0xdc
[10203960.230000] [<80005430>] ret_from_irq+0x0/0x4
[10203960.230000] [<80013380>] r4k_wait_irqoff+0x18/0x20
[10203960.240000] [<8005d6a4>] cpu_startup_entry+0x124/0x1b8
[10203960.240000] [<8001ae0c>] start_secondary+0x410/0x440
[10203960.250000]
[10203960.250000] ---[ end trace 51ca218f6e656690 ]---
[10203960.260000] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[10203960.260000] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[10203960.270000] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0,
base=0ef10000, max=512, ctx=118, dtx=118, fdx=117, next=118
[10203960.280000] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0,
base=0ef12000, max=512, calc=380, drx=381

The error seem to strike at random and the offsets and values reported
by mtk_soc_eth are different between routers. Also, the impact of the
error differs between networks. In some networks, the users only loose
connectivity for a short period of time. In others, networks
connectivity is lost (both WAN and LAN) until a router is rebooted. On
the routers with complete connectivity loss, the mtk_sock_eth-part of
the error message above keeps looping over and over (with the same
values).

When looking at the driver, I see what happens, but I can't figure out
why. There are no special devices connected to the any of the
networks, or at least the networks seem ordinary to me. I have been
trying to reproduce the issue myself by doing all sorts of stupid
stuff (including cutting RJ45-cables), but I have been unable to
trigger the error. Googling the error message turned up very little,
but I found a reference to an OpenWRT-issue containing the same error
message. The author of this issue reported that removing the SD card
driver had solved the problem for him, but my images are without this
driver.

Does anyone have any idea on what could be wrong, how to solve the
problem or where to look further?

Thanks in advance for any help,
Kristian



More information about the Lede-dev mailing list