lib/GCD.c regression on arm

Jisheng Zhang jszhang at marvell.com
Mon Jul 18 05:15:49 PDT 2016


Dear Cheah,

On Fri, 15 Jul 2016 21:51:10 +0800 Cheah Kok Cheong wrote:

> Commit fff7fb0b2d90 ("lib/GCD.c: use binary GCD algorithm instead of Euclidean")
> replaced the Euclidean algorithm totally with the Binary algorithm.
> Two variants were provided and selected via Kconfig depending on whether
> a fast __ffs (find least significant set bit) instruction is available.
> 
> For arm v5 and above the fast __ffs version is used as evident in
> arch/arm/mm/Kconfig.
> 
> I benchmarked the gcd performance using the code provided in the commit
> with a Cortex-A9 based Mediatek MT6577. Three runs at different settings
> were used.
> 
> The performance with fast __ffs Binary algo is slower than the Euclidean
> algo. Using the non ffs version [even/odd variant] gives a comparable
> performance as the Euclidean algo.

Interesting, using the code in the commit, I get the following result
on A CA53 platform

build with aarch64 toolchain, -O2 -mcpu=cortex-a53

~ # /a53 -r 500000 -n 10
gcd0: elapsed 10170
gcd1: elapsed 11340
gcd2: elapsed 13590
gcd3: elapsed 11700
gcd4: elapsed 14230
PASS

build with armhf toolchain, -O2 -mcpu=cortex-a53

~ # /a53_32 -r 500000 -n 10
gcd0: elapsed 9490
gcd1: elapsed 10220
gcd2: elapsed 10790
gcd3: elapsed 10270
gcd4: elapsed 10850
PASS


> 
> Will be interesting to see whether this is also true for other platforms
> with arm v5 and above? Hopefully others will do some testing.
> If this is the case then we should "select CPU_NO_EFFICIENT_FFS" in our
> Kconfig.
> 
> Thanks.
> Best Regards,
> Cheah
> 
> cross compiled with '-O2'
> 
> Euclidean                 Binary with ffs           Binary no ffs
> 
> 
> gcd -r 50000 -n 10        
> 
> gcd0: elapsed 25766       gcd0: elapsed 25766       gcd0: elapsed 25765
> gcd1: elapsed 19994       gcd1: elapsed 20224       gcd1: elapsed 19843
> gcd2: elapsed 20071       gcd2: elapsed 20533       gcd2: elapsed 20151
> gcd3: elapsed 20070       gcd3: elapsed 20380       gcd3: elapsed 19919
> gcd4: elapsed 20148       gcd4: elapsed 20610       gcd4: elapsed 20151
> PASS                      PASS                      PASS
>            
> gcd0: elapsed 26690       gcd0: elapsed 26612       gcd0: elapsed 24381
> gcd1: elapsed 20224       gcd1: elapsed 20379       gcd1: elapsed 19765
> gcd2: elapsed 20224       gcd2: elapsed 20304       gcd2: elapsed 19842
> gcd3: elapsed 20148       gcd3: elapsed 20302       gcd3: elapsed 19919
> gcd4: elapsed 20301       gcd4: elapsed 20302       gcd4: elapsed 19919
> PASS                      PASS                      PASS
>                                          
> gcd0: elapsed 25842       gcd0: elapsed 26459       gcd0: elapsed 25457
> gcd1: elapsed 20454       gcd1: elapsed 20532       gcd1: elapsed 20225
> gcd2: elapsed 20378       gcd2: elapsed 20762       gcd2: elapsed 20226
> gcd3: elapsed 20378       gcd3: elapsed 20378       gcd3: elapsed 20148
> gcd4: elapsed 20532       gcd4: elapsed 20918       gcd4: elapsed 20301
> PASS                      PASS                      PASS
> 
> 
> gcd -r 1000 -n 100
>                                             
> gcd0: elapsed 245873      gcd0: elapsed 252957      gcd0: elapsed 245571
> gcd1: elapsed 191290      gcd1: elapsed 198345      gcd1: elapsed 192513
> gcd2: elapsed 192672      gcd2: elapsed 199579      gcd2: elapsed 192978
> gcd3: elapsed 191366      gcd3: elapsed 198728      gcd3: elapsed 192283
> gcd4: elapsed 193134      gcd4: elapsed 200884      gcd4: elapsed 193669
> PASS                      PASS                      PASS
> 
> gcd0: elapsed 245180      gcd0: elapsed 251113      gcd0: elapsed 250573
> gcd1: elapsed 191755      gcd1: elapsed 196800      gcd1: elapsed 194729
> gcd2: elapsed 192286      gcd2: elapsed 198654      gcd2: elapsed 195574
> gcd3: elapsed 191601      gcd3: elapsed 197344      gcd3: elapsed 194965
> gcd4: elapsed 193135      gcd4: elapsed 200268      gcd4: elapsed 197037
> PASS                      PASS                      PASS
> 
> gcd0: elapsed 243412      gcd0: elapsed 252189      gcd0: elapsed 247876
> gcd1: elapsed 190447      gcd1: elapsed 197192      gcd1: elapsed 193355
> gcd2: elapsed 192288      gcd2: elapsed 199042      gcd2: elapsed 193437
> gcd3: elapsed 190755      gcd3: elapsed 198957      gcd3: elapsed 193660
> gcd4: elapsed 192672      gcd4: elapsed 200346      gcd4: elapsed 194586
> PASS                      PASS                      PASS
> 
> 
> gcd -n 1000
> 
> gcd0: elapsed 2636655     gcd0: elapsed 2701340     gcd0: elapsed 2622109
> gcd1: elapsed 2055411     gcd1: elapsed 2153446     gcd1: elapsed 2053342
> gcd2: elapsed 2064420     gcd2: elapsed 2162496     gcd2: elapsed 2066503
> gcd3: elapsed 2055151     gcd3: elapsed 2163201     gcd3: elapsed 2055161
> gcd4: elapsed 2071591     gcd4: elapsed 2171636     gcd4: elapsed 2074488
> PASS                      PASS                      PASS
> 
> gcd0: elapsed 2636512     gcd0: elapsed 2719436     gcd0: elapsed 2613575
> gcd1: elapsed 2060157     gcd1: elapsed 2159284     gcd1: elapsed 2046187
> gcd2: elapsed 2069242     gcd2: elapsed 2163944     gcd2: elapsed 2056430
> gcd3: elapsed 2060436     gcd3: elapsed 2166796     gcd3: elapsed 2046933
> gcd4: elapsed 2074188     gcd4: elapsed 2176243     gcd4: elapsed 2065170
> PASS                      PASS                      PASS
> 
> gcd0: elapsed 2614949     gcd0: elapsed 2708342     gcd0: elapsed 2632962
> gcd1: elapsed 2044957     gcd1: elapsed 2157985     gcd1: elapsed 2055475
> gcd2: elapsed 2054496     gcd2: elapsed 2170720     gcd2: elapsed 2068926
> gcd3: elapsed 2044838     gcd3: elapsed 2167954     gcd3: elapsed 2055305
> gcd4: elapsed 2059033     gcd4: elapsed 2176002     gcd4: elapsed 2079856
> PASS                      PASS                      PASS
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel




More information about the linux-arm-kernel mailing list