[FS#633] 802.1ad (QinQ) VLANs broken since kernel 4.3 in ramips mt7620

LEDE Bugs lede-bugs at lists.infradead.org
Thu Mar 16 05:23:57 PDT 2017


A new Flyspray task has been opened.  Details are below. 

User who did this - rogerpueyo (rogerpueyo) 

Attached to Project - LEDE Project
Summary - 802.1ad (QinQ) VLANs broken since kernel 4.3 in ramips mt7620
Task Type - Bug Report
Category - Kernel
Status - Unconfirmed
Assigned To - 
Operating System - All
Severity - Alta
Priority - Normal
Reported Version - All
Due in Version - Undecided
Due Date - Undecided
Details - Hi,

Since OpenWrt commit [[https://github.com/openwrt/openwrt/commit/930035f9fa98f438a71337f707fc43316318a1be|930035f9fa98f438a71337f707fc43316318a1be - ralink: bump to the target to v4.3]], 802.1ad (QinQ) VLANs seem not to be working properly on the MT7620 ramips chip.

====Setup====
I have four heterogeneous devices connected to a hub (see attached image) via the following interfaces:


  * PC Engines APU (x86_64) => eth2
  * Xiaomi MiWiFi Mini (ramips **mt7620**) => eth0.2 (blue port)
  * Ubiquiti NanoStation M5 XW (ar71xx) => eth0.2 (Secondary port)
  * Wavlink WL-WN575A3 (ramips mt7628) => eth0.2 (WAN port)

====Network configuration====

On top of these interfaces I've created the 802.1ad VLAN12. In the APU, the VLAN is directly on top of the eth2 interface. On the other devices, where the physical ports are connected to the internal switch, the VLAN is on top of the eth0.2 interface:

  * APU (eth2)

config device 'eth2_12'
        option type '8021ad'
        option name 'eth2_12'
        option ifname 'eth2'
        option vid '12'
        option proto 'static'
        option ip6addr 'FD02:0:0:DC9F:DB4F:A1F6:1:112/128'

config interface 'eth2_12_ad'
        option ifname 'eth2_12'
        option auto '1'


  * MiWiFiMini (eth0.2)

config device 'eth0_2_12'
        option type '8021ad'
        option name 'eth0_2_12'
        option ifname 'eth0.2'
        option vid '12'
        option proto 'static'
        option ip6addr 'FD02:0:0:DC9F:DB4F:A1F6:2:112/128'

config interface 'eth0_2_12_ad'
        option ifname 'eth0_2_12'
        option auto '1'


  * WL-WN575A3 (eth0.2)

config device 'eth0_2_12'
        option type '8021ad'
        option name 'eth0_2_12'
        option ifname 'eth0.2'
        option vid '12'
        option proto 'static'
        option ip6addr 'FD02:0:0:DC9F:DB4F:A1F6:3:112/128'

config interface 'eth0_2_12_ad'
        option ifname 'eth0_2_12'
        option auto '1'


  * NanoM5XW (eth0.2)

config device 'eth0_2_12'
        option type '8021ad'
        option name 'eth0_2_12'
        option ifname 'eth0.2'
        option vid '12'
        option proto 'static'
        option ip6addr 'FD02:0:0:DC9F:DB4F:A1F6:4:112/128'

config interface 'eth0_2_12_ad'
        option ifname 'eth0_2_12'
        option auto '1'


The devices can ping6 each other using the link-local addresses. For example:
  * APU

root at LEDE:/# ping6 ff02::1%eth2_12
PING ff02::1%eth2_12 (ff02::1%eth2_12): 56 data bytes
64 bytes from fe80::20d:b9ff:fe33:98d6: seq=0 ttl=64 time=0.471 ms
64 bytes from fe80::823f:5dff:fea8:557f: seq=0 ttl=64 time=0.859 ms (DUP!)
64 bytes from fe80::f2b4:29ff:fe60:902a: seq=0 ttl=64 time=1.028 ms (DUP!)
64 bytes from fe80::46d9:e7ff:fe47:8292: seq=0 ttl=64 time=1.061 ms (DUP!)

  * MiWiFi

root at MiWiFi:/# ping6 ff02::1%eth0_2_12
PING ff02::1%eth0_2_12 (ff02::1%eth0_2_12): 56 data bytes
64 bytes from fe80::f2b4:29ff:fe60:902a: seq=0 ttl=64 time=0.480 ms
64 bytes from fe80::823f:5dff:fea8:557f: seq=0 ttl=64 time=1.140 ms (DUP!)
64 bytes from fe80::46d9:e7ff:fe47:8292: seq=0 ttl=64 time=1.340 ms (DUP!)
64 bytes from fe80::20d:b9ff:fe33:98d6: seq=0 ttl=64 time=1.420 ms (DUP!)


====Problem====
**The packets sent by the Xiaomi MiWiFi Mini (ramips MT7620) via the 802.1ad VLAN have incorrect checksums**. This makes communication between this device and the rest not to work when using 802.1ad VLANs.

===Log===

For instance, when trying to initiate an SSH session from the Xiaomi MiWiFi Mini to another device using the link-local IPv6 address on the eth0_2_12 interface (ssh root at fe80::823f:5dff:fea8:557f%eth0_2_12), it fails. Here is the tcpdump of the [failed] session, showing the error message (keyword: **incorrect**):

root at LEDE:/# tcpdump -i eth2_12 -vv
[ 2677.172819] device eth2_12 entered promiscuous mode
[ 2677.177785] device eth2 entered promiscuous mode
tcpdump: listening on eth2_12, link-type EN10MB (Ethernet), capture size 262144 bytes
12:12:02.453576 IP6 (flowlabel 0x3c378, hlim 64, next-header TCP (6) payload length: 40) fe80::f2b4:29ff:fe60:902a.58062 > fe80::823f:5dff:fea8:557f.22: Flags [S], cksum 0xdcd6 (incorrect -> 0xfe42), seq 4255601391, win 28800, options [mss 1440,sackOK,TS val 182573 ecr 0,nop,wscale 4], length 0
12:12:03.451223 IP6 (flowlabel 0x3c378, hlim 64, next-header TCP (6) payload length: 40) fe80::f2b4:29ff:fe60:902a.58062 > fe80::823f:5dff:fea8:557f.22: Flags [S], cksum 0xdcd6 (incorrect -> 0xfdde), seq 4255601391, win 28800, options [mss 1440,sackOK,TS val 182673 ecr 0,nop,wscale 4], length 0
12:12:05.451007 IP6 (flowlabel 0x6a388, hlim 64, next-header TCP (6) payload length: 40) fe80::f2b4:29ff:fe60:902a.58062 > fe80::823f:5dff:fea8:557f.22: Flags [S], cksum 0xdcd6 (incorrect -> 0xfd16), seq 4255601391, win 28800, options [mss 1440,sackOK,TS val 182873 ecr 0,nop,wscale 4], length 0
12:12:07.460776 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::f2b4:29ff:fe60:902a > fe80::823f:5dff:fea8:557f: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::823f:5dff:fea8:557f
	  source link-address option (1), length 8 (1): f0:b4:29:60:90:2a
	    0x0000:  f0b4 2960 902a
12:12:07.461197 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 24) fe80::823f:5dff:fea8:557f > fe80::f2b4:29ff:fe60:902a: [icmp6 sum ok] ICMP6, neighbor advertisement, length 24, tgt is fe80::823f:5dff:fea8:557f, Flags [router, solicited]
12:12:09.460943 IP6 (flowlabel 0x2b6f7, hlim 64, next-header TCP (6) payload length: 40) fe80::f2b4:29ff:fe60:902a.58062 > fe80::823f:5dff:fea8:557f.22: Flags [S], cksum 0xdcd6 (incorrect -> 0xfb85), seq 4255601391, win 28800, options [mss 1440,sackOK,TS val 183274 ecr 0,nop,wscale 4], length 0
12:12:12.469016 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::823f:5dff:fea8:557f > fe80::f2b4:29ff:fe60:902a: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::f2b4:29ff:fe60:902a
	  source link-address option (1), length 8 (1): 80:3f:5d:a8:55:7f
	    0x0000:  803f 5da8 557f
12:12:12.469101 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 24) fe80::f2b4:29ff:fe60:902a > fe80::823f:5dff:fea8:557f: [icmp6 sum ok] ICMP6, neighbor advertisement, length 24, tgt is fe80::f2b4:29ff:fe60:902a, Flags [router, solicited]


However, the SSH session outside the 802.1ad VLAN (e.g. via eth0.2 directly: ssh root at fe80::823f:5dff:fea8:557f%eth0_2_12), succeeds.

====Other devices affected====
The problem affects other MT7620-based devices (e.g. Nexx WT3020, ZBT-APE522II), which show the same behaviour.

====Causes====
I've identified OpenWrt commit [[https://github.com/openwrt/openwrt/commit/930035f9fa98f438a71337f707fc43316318a1be|930035f9fa98f438a71337f707fc43316318a1be - ralink: bump to the target to v4.3]] from December 2015 to be the cause for this bug. Flashing an image compiled with the previous commit does not show the problem described.

One or more files have been attached.

More information can be found at the following URL:
https://bugs.lede-project.org/index.php?do=details&task_id=633



More information about the lede-bugs mailing list