CFT: move outer_cache_sync() out of line

Russell King - ARM Linux linux at arm.linux.org.uk
Mon Jan 12 08:36:48 PST 2015


The theory is that moving outer_cache_sync() out of line moves the
conditional test from multiple sites to a single location, which
in turn means that the branch predictor stands a better chance of
making the correct decision.

However, I'm a hardware pauper - I have the choice of testing this
on iMX6 (which seems to have bus limitations, and seems to produce
a wide variation on measured performance which makes it hard to
evaluate any differences) or the Versatile Express, which really is
not suitable for making IO performance measurements.

So, without help from people with "good" hardware, I can't evaluate
the performance impact of this change.  Anyone willing to do some
performance testing on this and feedback any numbers?

Obviously, drivers which make use of a lot of non-relaxed IO accessors
will be most affected by this - so you really have to know your
drivers when deciding how to evaluate this (sorry, I can't say "measure
network bandwidth" or "measure SSD sata performance" is a good test.)

Theoretically, this should help overall system performance, since the
branch predictor should be able to predict this better, but it's entirely
possible that trying to benchmark a single workload won't be measurably
different.

In terms of kernel size figures, this change alone saves almost 17K of
10MB of kernel text on my iMX6 kernels - which is bordering on
insignificant since that's not quite a 0.2% saving.

So... right now I can't justify this change, but I'm hoping some can come
up with some figures which shows that it benefits their workload without
causing a performance regression for others.

diff --git a/arch/arm/include/asm/outercache.h b/arch/arm/include/asm/outercache.h
index 891a56b35bcf..807e4e71c8e7 100644
--- a/arch/arm/include/asm/outercache.h
+++ b/arch/arm/include/asm/outercache.h
@@ -133,11 +133,7 @@ static inline void outer_resume(void) { }
  * Ensure that all outer cache operations are complete and any store
  * buffers are drained.
  */
-static inline void outer_sync(void)
-{
-	if (outer_cache.sync)
-		outer_cache.sync();
-}
+extern void outer_sync(void);
 #else
 static inline void outer_sync(void)
 { }
diff --git a/arch/arm/mm/l2c-common.c b/arch/arm/mm/l2c-common.c
index 10a3cf28c362..b1c24c8c1eb9 100644
--- a/arch/arm/mm/l2c-common.c
+++ b/arch/arm/mm/l2c-common.c
@@ -7,9 +7,17 @@
  * published by the Free Software Foundation.
  */
 #include <linux/bug.h>
+#include <linux/export.h>
 #include <linux/smp.h>
 #include <asm/outercache.h>
 
+void outer_sync(void)
+{
+	if (outer_cache.sync)
+		outer_cache.sync();
+}
+EXPORT_SYMBOL(outer_sync);
+
 void outer_disable(void)
 {
 	WARN_ON(!irqs_disabled());


-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.



More information about the linux-arm-kernel mailing list