[LEDE-DEV] Adding new targets/subtargets

Mon Jan 2 16:57:03 PST 2017

> On Jan 2, 2017, at 1:05 PM, Stefan Lippers-Hollmann <s.L-H at gmx.de> wrote:
> 
> Hi
> 
> On 2017-01-02, Philip Prindeville wrote:
>>> On Jan 2, 2017, at 10:01 AM, Jo-Philipp Wich <jo at mein.io> wrote:
>>> 
>>> Hi,
>>> 
>>>> The x86/64/config-default is missing the following switches:
>>>> 
>>>> CONFIG_MCORE2=y  
> [...]
>> Right, this is why I’m trying to create a new target (or subtarget) called “xeon” which is optimized for Xeon targets and leverages the on-chip crypto-accelerators.
> 
> This is just an optimization, but not actually needed to get the
> firmware running on your target CPU. While it may, or may not, provide
> measurable speedups, none of the large(r) binary distros consider this
> to be a necessary optimization, so why do you think it's necessary to
> provide just this tiny micro-optimization as a dedicated subtarget with
> all the overhead this entails - rather than just using as a local 
> configuration for your own builds?

There are several useful reasons to do so.

Because it may shake out some bugs in the x86_64 configuration or make machinery.

Because it may provide a useful example of how someone might optimize a build should they have a similar need in the future.

Because the low power Opteron, the Athalon-FX/2, Atom64, Core2/duo, and 5th gen Xeon processors have significant differences in pipeline depth and how many cycles it takes the ALU to settle after an overlapped operation, which has strong implications to how the code generator, optimizer, and peep-hole post-optimizer (in an optimizing loader) would emit code.

Lastly, because we’re not limited to committing this directly into the main tree… they could also go into “targets” feed.

> 
>> We’ve come a long way since the Athalon-64 (k8) in 2004.
> 
> The situation on amd64/ x86_64 is quite a bit better than on i[3456]86,
> probably very little actually makes a difference for routing tasks
> (this could be different if LEDE would be a common basis for image
> or video transcoding, but I seriously doubt that optimizing for core2
> would actually make a significant difference on a router, especially
> considering that pretty much any amd64/ x86_64 CPU[1] is way more 
> powerful than any of the more prevalent routing architectures). I think 
> it would be useful to actually show the difference your change makes on 
> modern CPUs, before proactively introducing new subtargets for cosmetic 
> reasons.

Well, I’d be happy to try that, but first need to validate the build machinery… since we’ve only ever had one x86_64 target, it hasn’t been validated…  since it hasn’t been validated, that’s probably discouraged anyone from testing (and adding) any useful additional targets… so we have a circular dependency.

If you’re using a single server with several 802.11ac cards in it and point-to-point connections, you can easily get into several gigabits/sec worth of traffic.  Further, if you have multi-tenancy and per-customer SLA’s, you can get into some sophisticated shaping/policing scenarios, and it might be the case that an x86, while “way more powerful than any of the more prevalent routing architectures”, is still going to be taxed as a software only solution (versus say a MIPS 7K with ASICs).

> 
> Regards
> 	Stefan Lippers-Hollmann
> 
> [1]	I'm quite convinced that even a 2003 vintage AMD64 Opteron from
> 	the first generation sledgehammer design wouldn't find its 
> 	limitations on the CPU side (unless you go beyond 1 GBit/s), but
> 	rather on the bus connection of your ethernet cards (old PCI 
> 	won't saturate a 1 GBit/s link).

And that’s the scenario I’m looking at: multiple connections each potentially significantly more than a gigabit per-second.

-Philip