[LEDE-DEV] [PATCH v2] ramips: add support for Ubiquiti EdgeRouter X-SFP

Tue Jun 13 10:08:15 PDT 2017

To my limited knowlegde RCU is a notoriously difficult pattern to code right.

In order to avoid locking in concurrent processing context RCU uses primitives like memory barriers which demands very strict and often subtle rules to be followed. Implementations of memory barriers are very architecture specific and - again to my limited knowledge - different CPUs of the same architecture may exposes quirks with some of the used instructions that must be coded around.
In short: this problem likely requires lots of rare expertise.

In reply to Kevins question about things to try, maybe comparing occurrences of these stalls within the RCU kernel thread between different CPUs within the same architecture might yield a clue to whether some CPU specific quirk is involved.

Could not help painting a grim picture,
Paul

(in the Netherlands the green party, GroenLinks, today dropped out of the negotiations for a new government; that makes me sad)

> Op 13 jun. 2017, om 10:49 heeft Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> het volgende geschreven:
> 
> 
> 
> On 12/06/17 21:00, Toke Høiland-Jørgensen wrote:
>> p.wassi at gmx.at writes:
>>> My SQM configuration was basically just using cake + piece_of_cake.qos,
>>> but that's clearly off topic for now. (I'm also CC'ing this mail to Toke,
>>> the maintainer of sqm-scripts).
>> If you're crashing the box my guess would be there's a bug in the cake
>> qdisc somewhere. What happens if you run SQM with fq_codel instead?
>> -Toke
> 
> This isn't the first time I've heard cake implicated in cpu stalls but trying to discern a signal in some of the noise is difficult.
> 
> Using 'fq_codel' would be a good first elimination round.
> 
> For 2nd round elimination:  Cake is the only qdisc to my knowledge that pulls apart large 'GSO' (Generic segmentation offload) packets prior to sending them up the stack, a process cake calls 'peeling'.  It does this to retain control on how to schedule a (up to 64K) 'super packet', breaking it up into a series of 1500 byte packets instead.  Some have reported 'messing with ethtool' to disable GSO as being helpful.  I know not how 'ethtool' works.
> 
> Whether this is a bug in the cake peeling code, network interface driver is unclear, and again anecdotal evidence suggests this is only seen on multi-cpu systems.  I dread to think what happens if one cpu starts 'grabbing one of those large skbs' for sending purposes, whilst another (in cake) is busy breaking it apart, or indeed if that scenario is possible.
> 
> Some ideas/thoughts/things to try :-)  Apologies for the continuing hijack.
> 
> KDB
> 
> _______________________________________________
> Lede-dev mailing list
> Lede-dev at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/lede-dev