lib/GCD.c regression on arm
Jisheng Zhang
jszhang at marvell.com
Mon Jul 18 05:15:49 PDT 2016
Dear Cheah,
On Fri, 15 Jul 2016 21:51:10 +0800 Cheah Kok Cheong wrote:
> Commit fff7fb0b2d90 ("lib/GCD.c: use binary GCD algorithm instead of Euclidean")
> replaced the Euclidean algorithm totally with the Binary algorithm.
> Two variants were provided and selected via Kconfig depending on whether
> a fast __ffs (find least significant set bit) instruction is available.
>
> For arm v5 and above the fast __ffs version is used as evident in
> arch/arm/mm/Kconfig.
>
> I benchmarked the gcd performance using the code provided in the commit
> with a Cortex-A9 based Mediatek MT6577. Three runs at different settings
> were used.
>
> The performance with fast __ffs Binary algo is slower than the Euclidean
> algo. Using the non ffs version [even/odd variant] gives a comparable
> performance as the Euclidean algo.
Interesting, using the code in the commit, I get the following result
on A CA53 platform
build with aarch64 toolchain, -O2 -mcpu=cortex-a53
~ # /a53 -r 500000 -n 10
gcd0: elapsed 10170
gcd1: elapsed 11340
gcd2: elapsed 13590
gcd3: elapsed 11700
gcd4: elapsed 14230
PASS
build with armhf toolchain, -O2 -mcpu=cortex-a53
~ # /a53_32 -r 500000 -n 10
gcd0: elapsed 9490
gcd1: elapsed 10220
gcd2: elapsed 10790
gcd3: elapsed 10270
gcd4: elapsed 10850
PASS
>
> Will be interesting to see whether this is also true for other platforms
> with arm v5 and above? Hopefully others will do some testing.
> If this is the case then we should "select CPU_NO_EFFICIENT_FFS" in our
> Kconfig.
>
> Thanks.
> Best Regards,
> Cheah
>
> cross compiled with '-O2'
>
> Euclidean Binary with ffs Binary no ffs
>
>
> gcd -r 50000 -n 10
>
> gcd0: elapsed 25766 gcd0: elapsed 25766 gcd0: elapsed 25765
> gcd1: elapsed 19994 gcd1: elapsed 20224 gcd1: elapsed 19843
> gcd2: elapsed 20071 gcd2: elapsed 20533 gcd2: elapsed 20151
> gcd3: elapsed 20070 gcd3: elapsed 20380 gcd3: elapsed 19919
> gcd4: elapsed 20148 gcd4: elapsed 20610 gcd4: elapsed 20151
> PASS PASS PASS
>
> gcd0: elapsed 26690 gcd0: elapsed 26612 gcd0: elapsed 24381
> gcd1: elapsed 20224 gcd1: elapsed 20379 gcd1: elapsed 19765
> gcd2: elapsed 20224 gcd2: elapsed 20304 gcd2: elapsed 19842
> gcd3: elapsed 20148 gcd3: elapsed 20302 gcd3: elapsed 19919
> gcd4: elapsed 20301 gcd4: elapsed 20302 gcd4: elapsed 19919
> PASS PASS PASS
>
> gcd0: elapsed 25842 gcd0: elapsed 26459 gcd0: elapsed 25457
> gcd1: elapsed 20454 gcd1: elapsed 20532 gcd1: elapsed 20225
> gcd2: elapsed 20378 gcd2: elapsed 20762 gcd2: elapsed 20226
> gcd3: elapsed 20378 gcd3: elapsed 20378 gcd3: elapsed 20148
> gcd4: elapsed 20532 gcd4: elapsed 20918 gcd4: elapsed 20301
> PASS PASS PASS
>
>
> gcd -r 1000 -n 100
>
> gcd0: elapsed 245873 gcd0: elapsed 252957 gcd0: elapsed 245571
> gcd1: elapsed 191290 gcd1: elapsed 198345 gcd1: elapsed 192513
> gcd2: elapsed 192672 gcd2: elapsed 199579 gcd2: elapsed 192978
> gcd3: elapsed 191366 gcd3: elapsed 198728 gcd3: elapsed 192283
> gcd4: elapsed 193134 gcd4: elapsed 200884 gcd4: elapsed 193669
> PASS PASS PASS
>
> gcd0: elapsed 245180 gcd0: elapsed 251113 gcd0: elapsed 250573
> gcd1: elapsed 191755 gcd1: elapsed 196800 gcd1: elapsed 194729
> gcd2: elapsed 192286 gcd2: elapsed 198654 gcd2: elapsed 195574
> gcd3: elapsed 191601 gcd3: elapsed 197344 gcd3: elapsed 194965
> gcd4: elapsed 193135 gcd4: elapsed 200268 gcd4: elapsed 197037
> PASS PASS PASS
>
> gcd0: elapsed 243412 gcd0: elapsed 252189 gcd0: elapsed 247876
> gcd1: elapsed 190447 gcd1: elapsed 197192 gcd1: elapsed 193355
> gcd2: elapsed 192288 gcd2: elapsed 199042 gcd2: elapsed 193437
> gcd3: elapsed 190755 gcd3: elapsed 198957 gcd3: elapsed 193660
> gcd4: elapsed 192672 gcd4: elapsed 200346 gcd4: elapsed 194586
> PASS PASS PASS
>
>
> gcd -n 1000
>
> gcd0: elapsed 2636655 gcd0: elapsed 2701340 gcd0: elapsed 2622109
> gcd1: elapsed 2055411 gcd1: elapsed 2153446 gcd1: elapsed 2053342
> gcd2: elapsed 2064420 gcd2: elapsed 2162496 gcd2: elapsed 2066503
> gcd3: elapsed 2055151 gcd3: elapsed 2163201 gcd3: elapsed 2055161
> gcd4: elapsed 2071591 gcd4: elapsed 2171636 gcd4: elapsed 2074488
> PASS PASS PASS
>
> gcd0: elapsed 2636512 gcd0: elapsed 2719436 gcd0: elapsed 2613575
> gcd1: elapsed 2060157 gcd1: elapsed 2159284 gcd1: elapsed 2046187
> gcd2: elapsed 2069242 gcd2: elapsed 2163944 gcd2: elapsed 2056430
> gcd3: elapsed 2060436 gcd3: elapsed 2166796 gcd3: elapsed 2046933
> gcd4: elapsed 2074188 gcd4: elapsed 2176243 gcd4: elapsed 2065170
> PASS PASS PASS
>
> gcd0: elapsed 2614949 gcd0: elapsed 2708342 gcd0: elapsed 2632962
> gcd1: elapsed 2044957 gcd1: elapsed 2157985 gcd1: elapsed 2055475
> gcd2: elapsed 2054496 gcd2: elapsed 2170720 gcd2: elapsed 2068926
> gcd3: elapsed 2044838 gcd3: elapsed 2167954 gcd3: elapsed 2055305
> gcd4: elapsed 2059033 gcd4: elapsed 2176002 gcd4: elapsed 2079856
> PASS PASS PASS
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
More information about the linux-arm-kernel
mailing list