[LEDE-DEV] Transmit timeouts with mtk_eth_soc and MT7621

John Crispin john at phrozen.org
Sat Aug 19 14:52:13 PDT 2017



On 19/08/17 23:13, Kristian Evensen wrote:
> Hi both,
>
> On Sat, 19 Aug 2017 at 20:16, John Crispin <john at phrozen.org 
> <mailto:john at phrozen.org>> wrote:
>
>     Hi All,
>
>     i have a staged commit on my laptop that makes all the (upstream)
>     ethernet fixes that i pushed to mt7623 work on mt7621. please hang on
>     for a few more days till i finished testing the support. this will add
>     latest upstream ethernet support + DSA
>
>
> Thanks for the follow-up Mingyu and the info John. I have not had time 
> to investigate the issue further (holiday backlog ...), but will start 
> working on trying to reproduce it at the end of next week. I have 
> deployed the patch to some routers and have not seen any regressions, 
> but I would like to know how to reliably trigger the issue before 
> concluding :)
>
> John, does your commits include a fix similar to what Mingyu sent me?


with my fixes the mt7623 passes a 48h stress test running the unit on a 
iperf test with 200 parallel flows at full wire speed. once backported 
to mt7621 i am pretty confident that the fix will yield the maximum 
stable performance we can get.
      John

>
> Kristian
>
>
>
>          John
>
>
>     On 19/08/17 17:06, Mingyu Li wrote:
>     > Hi Kristian.
>     >
>     > does this patch works?
>     >
>     > 2017-07-24 23:45 GMT+08:00 Mingyu Li <igvtee at gmail.com
>     <mailto:igvtee at gmail.com>>:
>     >> i guess more other interrupts maybe cause the problem. because the
>     >> ethernet receive flow is interrupt by other hardware. so use sd
>     card,
>     >> wifi or usb can generate interrupts.
>     >>
>     >> 2017-07-24 17:19 GMT+08:00 Kristian Evensen
>     <kristian.evensen at gmail.com <mailto:kristian.evensen at gmail.com>>:
>     >>> Hi,
>     >>>
>     >>> On Mon, Jul 24, 2017 at 4:02 AM, Mingyu Li <igvtee at gmail.com
>     <mailto:igvtee at gmail.com>> wrote:
>     >>>> i guest the problem is there are some tx data not free. but tx
>     >>>> interrupt is clean. cause tx timeout. the old code will free data
>     >>>> first then clean interrupt. but there maybe new data arrive
>     after free
>     >>>> data before clean interrupt.
>     >>>> so change it to clean interrupt first then clean all tx data(
>     also
>     >>>> remove the budget limit). if new tx data arrive. hardware
>     will set tx
>     >>>> interrupt flag. then we will free it next time.
>     >>>> i also apply this to rx flow.
>     >>> Thanks for the detailed explanation. I have deployed an image
>     with the
>     >>> patch to some of the routers showing this issue, so lets wait
>     and see.
>     >>> Of course, all routers have been stable for the last couple of
>     days
>     >>> (including before the weekend) now, so I will let them run for
>     a week
>     >>> or so and then report back.
>     >>>
>     >>> In order to ease testing and make it more controlled, do you
>     have any
>     >>> suggestions for how to trigger the error? Is it "just" a
>     timing issue
>     >>> or should I be able to trigger it with for example a specific
>     traffic
>     >>> pattern?
>     >>>
>     >>> -Kristian
>     > _______________________________________________
>     > Lede-dev mailing list
>     > Lede-dev at lists.infradead.org <mailto:Lede-dev at lists.infradead.org>
>     > http://lists.infradead.org/mailman/listinfo/lede-dev
>




More information about the Lede-dev mailing list