realtek: Setup all VLANs with default configurations

Bjørn Mork bjorn at mork.no
Sat May 8 16:39:05 BST 2021


Birger Koblitz <mail at birger-koblitz.de> writes:

> I tested the latest master on my 3 Zyxel devices, one for each SoC
> generation, the GS1900-10HP, the GS1900-48 and the XGS-1210-10. They
> got their IP addresses via DHCP on port 1 using the default network
> setup. I then tested pinging, multicast (VLC) and iperf between
> different non-management ports (2 and 3) and for multicast verified
> that nothing could be seen on the router's CPU-Port nor the management
> PC. I did not test anything specific to VLAN.

I don't think it's suficient to test without vlan tagging when messing
around with the vlan config on these switches. Experience has shown that
there are a number of possible and real failure modes, where you forward
with wrong tagging or filter on the wrong vid.


But I'm not convinced it works so well for the default either...  I see
this received on the native VLAN 1:


canardo:/tmp# tshark -nVi xeth0 -f 'udp port 67'
Running as user "root" and group "root". This could be dangerous.
Capturing on 'xeth0'
Frame 1: 346 bytes on wire (2768 bits), 346 bytes captured (2768 bits) on interface 0
    Interface id: 0 (xeth0)
        Interface name: xeth0
    Encapsulation type: Ethernet (1)
    Arrival Time: May  8, 2021 16:56:20.581072322 CEST
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1620485780.581072322 seconds
    [Time delta from previous captured frame: 0.000000000 seconds]
    [Time delta from previous displayed frame: 0.000000000 seconds]
    [Time since reference or first frame: 0.000000000 seconds]
    Frame Number: 1
    Frame Length: 346 bytes (2768 bits)
    Capture Length: 346 bytes (2768 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: eth:ethertype:ip:udp:bootp]
Ethernet II, Src: bc:a5:11:9f:e1:23, Dst: ff:ff:ff:ff:ff:ff
    Destination: ff:ff:ff:ff:ff:ff
        Address: ff:ff:ff:ff:ff:ff
        .... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default)
        .... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast)
    Source: bc:a5:11:9f:e1:23
        Address: bc:a5:11:9f:e1:23
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Type: IPv4 (0x0800)
    Frame check sequence: 0x80091000 incorrect, should be 0x63ff5c43
        [Expert Info (Error/Checksum): Bad checksum [should be 0x63ff5c43]]
            [Bad checksum [should be 0x63ff5c43]]
            [Severity level: Error]
            [Group: Checksum]
    [FCS Status: Bad]
Internet Protocol Version 4, Src: 0.0.0.0, Dst: 255.255.255.255
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        0000 00.. = Differentiated Services Codepoint: Default (0)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    Total Length: 328
    Identification: 0x0000 (0)
    Flags: 0x0000
        0... .... .... .... = Reserved bit: Not set
        .0.. .... .... .... = Don't fragment: Not set
        ..0. .... .... .... = More fragments: Not set
    Fragment offset: 0
    Time to live: 64
    Protocol: UDP (17)
    Header checksum: 0x79a6 [validation disabled]
    [Header checksum status: Unverified]
    Source: 0.0.0.0
    Destination: 255.255.255.255
User Datagram Protocol, Src Port: 68, Dst Port: 67
    Source Port: 68
    Destination Port: 67
    Length: 308
    Checksum: 0xd82b [unverified]
    [Checksum Status: Unverified]
 
[etc]



That seemingly magic number 0x80091000 instead of the expected FCS is
pretty suspicious. Leaking tag?  Or what?

> I merely verified via
> "bridge fdb"t that the entries in the forwarding table now use the
> vlan as forwarding ID together with the target MAC address. Before
> that they were using a fixed and very strange forwarding ID (4031 on
> the 8380) together with a MAC-address. At the moment I do not have
> access to these devices (in summer house) only a buggy DLink DGS
> 1210-10P, that does not want to bridge between ports with OpenWRT.
> I added some interesting debug possibility in my "pie" branch. copy
> debugfs.c over and you can use
>
> |cat /sys/kernel/debug/rtl838x/drop_counters|
>
> to see why the switch drops packets.




root at OpenWrt:/# cat /sys/kernel/debug/rtl838x/drop_counters                                                                                                                                                                                
ALE_TX_GOOD_PKTS: 74                                                                                                                                                                                                                       
MAC_RX_DROP: 0                                                                                                                                                                                                                             
ACL_FWD_DROP: 0                                                                                                                                                                                                                            
HW_ATTACK_PREVENTION_DROP: 0                                                                                                                                                                                                               
RMA_DROP: 0                                                                                                                                                                                                                                
VLAN_IGR_FLTR_DROP: 0
INNER_OUTER_CFI_EQUAL_1_DROP: 0
PORT_MOVE_DROP: 0
NEW_SA_DROP: 0
MAC_LIMIT_SYS_DROP: 0
MAC_LIMIT_VLAN_DROP: 0
MAC_LIMIT_PORT_DROP: 0
SWITCH_MAC_DROP: 0
ROUTING_EXCEPTION_DROP: 0
DA_LKMISS_DROP: 0
RSPAN_DROP: 0
ACL_LKMISS_DROP: 0
ACL_DROP: 0
INBW_DROP: 0
IGR_METER_DROP: 0
ACCEPT_FRAME_TYPE_DROP: 0
STP_IGR_DROP: 3
INVALID_SA_DROP: 0
SA_BLOCKING_DROP: 0
DA_BLOCKING_DROP: 0
L2_INVALID_DPM_DROP: 0
MCST_INVALID_DPM_DROP: 0
RX_FLOW_CONTROL_DROP: 0
STORM_SPPRS_DROP: 0
LALS_DROP: 0
VLAN_EGR_FILTER_DROP: 33
STP_EGR_DROP: 0
SRC_PORT_FILTER_DROP: 10
PORT_ISOLATION_DROP: 0
ACL_FLTR_DROP: 0
MIRROR_FLTR_DROP: 0
TX_MAX_DROP: 0
LINK_DOWN_DROP: 0
FLOW_CONTROL_DROP: 0
BRIDGE .1d discards: 0
root at OpenWrt:/# ping 192.168.100.111
PING 192.168.100.111 (192.168.100.111): 56 data bytes
^C
--- 192.168.100.111 ping statistics ---
5 packets transmitted, 0 packets received, 100% packet loss
root at OpenWrt:/# cat /sys/kernel/debug/rtl838x/drop_counters 
ALE_TX_GOOD_PKTS: 35
MAC_RX_DROP: 0
ACL_FWD_DROP: 0
HW_ATTACK_PREVENTION_DROP: 0
RMA_DROP: 0
VLAN_IGR_FLTR_DROP: 0
INNER_OUTER_CFI_EQUAL_1_DROP: 0
PORT_MOVE_DROP: 0
NEW_SA_DROP: 0
MAC_LIMIT_SYS_DROP: 0
MAC_LIMIT_VLAN_DROP: 0
MAC_LIMIT_PORT_DROP: 0
SWITCH_MAC_DROP: 0
ROUTING_EXCEPTION_DROP: 0
DA_LKMISS_DROP: 0
RSPAN_DROP: 0
ACL_LKMISS_DROP: 0
ACL_DROP: 0
INBW_DROP: 0
IGR_METER_DROP: 0
ACCEPT_FRAME_TYPE_DROP: 0
STP_IGR_DROP: 0
INVALID_SA_DROP: 0
SA_BLOCKING_DROP: 0
DA_BLOCKING_DROP: 0
L2_INVALID_DPM_DROP: 0
MCST_INVALID_DPM_DROP: 0
RX_FLOW_CONTROL_DROP: 0
STORM_SPPRS_DROP: 0
LALS_DROP: 0
VLAN_EGR_FILTER_DROP: 0
STP_EGR_DROP: 0
SRC_PORT_FILTER_DROP: 2
PORT_ISOLATION_DROP: 0
ACL_FLTR_DROP: 0
MIRROR_FLTR_DROP: 0
TX_MAX_DROP: 0
LINK_DOWN_DROP: 0
FLOW_CONTROL_DROP: 0
BRIDGE .1d discards: 0


Not sure I learned anything useful from this.  Are the counters reset on
every read?


> To debug the issues in master you could comment out the things done in
> dsa.c:rtl83xx_vlan_setup(). The only critical one for the multicast is 
> the setup of the vlans in the loop below
> // Initialize all vlans 0-4095

That's pretty much everything it does, isn't it? :-)

In any case, I tried it now with this and it does make the VLAN stuff
work again (still same issue with the FCS on the DHCP requests):

diff --git a/target/linux/realtek/files-5.4/drivers/net/dsa/rtl83xx/dsa.c b/target/linux/realtek/files-5.4/drivers/net/dsa/rtl83xx/dsa.c
index c5f243c55abd..d972bf6ff8b9 100644
--- a/target/linux/realtek/files-5.4/drivers/net/dsa/rtl83xx/dsa.c
+++ b/target/linux/realtek/files-5.4/drivers/net/dsa/rtl83xx/dsa.c
@@ -119,8 +119,8 @@ static void rtl83xx_vlan_setup(struct rtl838x_switch_priv *priv)
 
        pr_info("In %s\n", __func__);
 
-       priv->r->vlan_profile_setup(0);
-       priv->r->vlan_profile_setup(1);
+//     priv->r->vlan_profile_setup(0);
+//     priv->r->vlan_profile_setup(1);
        pr_info("UNKNOWN_MC_PMASK: %016llx\n", priv->r->read_mcast_pmask(UNKNOWN_MC_PMASK));
        priv->r->vlan_profile_dump(0);
 

The difference seems to be related to whether the CPU port(?) is a
member of UNKNOWN_MC_PMASK or not.  Failing:

[    7.615045] UNKNOWN_MC_PMASK: 000000000fffffff
[    7.620099] VLAN profile 0: L2 learning: 1, UNKN L2MC FLD PMSK 511,          UNKN IPMC FLD PMSK 511, UNKN IPv6MC FLD PMSK: 511       

Working:

[    7.625941] UNKNOWN_MC_PMASK: 000000001fffffff
[    7.630907] VLAN profile 0: L2 learning: 1, UNKN L2MC FLD PMSK 511,          UNKN IPMC FLD PMSK 511, UNKN IPv6MC FLD PMSK: 511



Actually, you don't have to disable the profile_setups.  This change is
sufficient to make VLANs work (at least in the limited testing I've done):


diff --git a/target/linux/realtek/files-5.4/drivers/net/dsa/rtl83xx/rtl838x.c b/target/linux/realtek/files-5.4/drivers/net/dsa/rtl83xx/rtl838x.c
index dfd773c5e6fc..5d764b6a32d6 100644
--- a/target/linux/realtek/files-5.4/drivers/net/dsa/rtl83xx/rtl838x.c
+++ b/target/linux/realtek/files-5.4/drivers/net/dsa/rtl83xx/rtl838x.c
@@ -398,7 +398,7 @@ static void rtl838x_vlan_profile_setup(int profile)
         * On RTL93XX, the portmask is directly set in the profile,
         * see e.g. rtl9300_vlan_profile_setup
         */
-       rtl838x_write_mcast_pmask(UNKNOWN_MC_PMASK, 0xfffffff);
+       rtl838x_write_mcast_pmask(UNKNOWN_MC_PMASK, 0x1fffffff);
 }
 
 static inline int rtl838x_vlan_port_egr_filter(int port)


No idea why or how.  Just a magic value with a magic result, like most
of the code.

I really don't think these changes have been tested well enough to be
pushed into openwrt master yet.  Based on this first impression, I would
be surprised if there isn't more broken stuff.



Bjørn



More information about the openwrt-devel mailing list