CFT: move outer_cache_sync() out of line
Jisheng Zhang
jszhang at marvell.com
Mon Jan 12 22:48:51 PST 2015
Hi Russell,
On Mon, 12 Jan 2015 08:36:48 -0800
Russell King - ARM Linux <linux at arm.linux.org.uk> wrote:
> The theory is that moving outer_cache_sync() out of line moves the
> conditional test from multiple sites to a single location, which
> in turn means that the branch predictor stands a better chance of
> making the correct decision.
>
> However, I'm a hardware pauper - I have the choice of testing this
> on iMX6 (which seems to have bus limitations, and seems to produce
> a wide variation on measured performance which makes it hard to
> evaluate any differences) or the Versatile Express, which really is
> not suitable for making IO performance measurements.
>
> So, without help from people with "good" hardware, I can't evaluate
> the performance impact of this change. Anyone willing to do some
> performance testing on this and feedback any numbers?
I applied your patch into marvell internal bsp tree, then did a basic emmc
performance test via. dd cmd, the performance is improved by 0.13%
the sdhc host driver is drivers/mmc/host/sdhci-pxav3.c, I'm not sure
it make use of a lot of non-relaxed IO accessors.
>
> Obviously, drivers which make use of a lot of non-relaxed IO accessors
> will be most affected by this - so you really have to know your
> drivers when deciding how to evaluate this (sorry, I can't say "measure
> network bandwidth" or "measure SSD sata performance" is a good test.)
>
> Theoretically, this should help overall system performance, since the
> branch predictor should be able to predict this better, but it's entirely
> possible that trying to benchmark a single workload won't be measurably
> different.
>
> In terms of kernel size figures, this change alone saves almost 17K of
> 10MB of kernel text on my iMX6 kernels - which is bordering on
> insignificant since that's not quite a 0.2% saving.
>
> So... right now I can't justify this change, but I'm hoping some can come
> up with some figures which shows that it benefits their workload without
> causing a performance regression for others.
>
> diff --git a/arch/arm/include/asm/outercache.h
> b/arch/arm/include/asm/outercache.h index 891a56b35bcf..807e4e71c8e7 100644
> --- a/arch/arm/include/asm/outercache.h
> +++ b/arch/arm/include/asm/outercache.h
> @@ -133,11 +133,7 @@ static inline void outer_resume(void) { }
> * Ensure that all outer cache operations are complete and any store
> * buffers are drained.
> */
> -static inline void outer_sync(void)
> -{
> - if (outer_cache.sync)
> - outer_cache.sync();
> -}
> +extern void outer_sync(void);
> #else
> static inline void outer_sync(void)
> { }
> diff --git a/arch/arm/mm/l2c-common.c b/arch/arm/mm/l2c-common.c
> index 10a3cf28c362..b1c24c8c1eb9 100644
> --- a/arch/arm/mm/l2c-common.c
> +++ b/arch/arm/mm/l2c-common.c
> @@ -7,9 +7,17 @@
> * published by the Free Software Foundation.
> */
> #include <linux/bug.h>
> +#include <linux/export.h>
> #include <linux/smp.h>
> #include <asm/outercache.h>
>
> +void outer_sync(void)
> +{
> + if (outer_cache.sync)
> + outer_cache.sync();
> +}
> +EXPORT_SYMBOL(outer_sync);
> +
> void outer_disable(void)
> {
> WARN_ON(!irqs_disabled());
>
>
More information about the linux-arm-kernel
mailing list