[LEDE-DEV] Transmit timeouts with mtk_eth_soc and MT7621

Kristian Evensen kristian.evensen at gmail.com
Thu Nov 16 07:15:01 PST 2017


Hello,

On Thu, Nov 9, 2017 at 8:42 PM, Kristian Evensen
<kristian.evensen at gmail.com> wrote:
> I replaced the 3526 with other devices containing the mt7530 switch
> (both mt7621 and mt7623-based boards), and the issues seems to be
> related to the switch rather than the SoC. I am able to reliably
> trigger the timeout on all devices I have tested, both running
> proprietary drivers/firmware and LEDE. I guess this points to that
> there is some traffic pattern or network behavior that triggers an
> error in the MT7530 and causes TX to freeze. Restarting the ports
> makes the switch work again, but as long as the "bad" device is
> connected to the mt7530 then it is just a matter of time before the
> timeout is back.

I think I am ready to conclude on this issue. First of all, I have
discovered that I made an incorrect statement earlier. I have not seen
the problem with flow control disabled. After finding a network tap
and a device which passes for example pause frames to the driver so I
can see them with tcpdump, I think I finally see what is going on.

I connected the tap between router #1 and router #2, and performed the
test described earlier with flow control enabled and disabled. When
triggering the RCU stall, I see a continuous flood of pause frames
coming from router #2. This flood happens irrespective of if flow
control is enabled or not. However, with flow control enabled, I see
that the RxPause- and TxPause-counters increase. With flow control
disabled, they remain at 0. In other words, it seems that the switch
filters out pause frames if the bit is unset in the feature register
(it would be great if someone could confirm/deny this).

The MT7530 switch seems to use one buffer for all ports, so what I
have seen all along is head of line blocking. Since I use iperf in UDP
mode, the traffic destined for router #2 never slows down and fills up
the buffer. Thus, all other traffic is blocked. When disabling the
port used by route #2, the buffer is cleared and packets can flow as
normal again. With flow control disabled, I do not see the head of
line blocking. If I am connected to router #1, I can always reach it.
If flow control is enabled, router #1 stops replying to for example
ping when the pause flood starts.

I don't know what is the correct "solution" for this problem. I asked
Piotr to mark my patch for always disabling flow control as not
applicable, but perhaps it should be brought back if everyone agrees
that disabling flow control is ok. If not, then perhaps the following
patch should be accepted so that it is possible to switch flow control
on/off: https://lists.openwrt.org/pipermail/openwrt-devel/2016-April/040705.html

BR,
Kristian



More information about the Lede-dev mailing list