lib/GCD.c regression on arm
Cheah Kok Cheong
thrust73 at gmail.com
Fri Jul 15 06:51:10 PDT 2016
Commit fff7fb0b2d90 ("lib/GCD.c: use binary GCD algorithm instead of Euclidean")
replaced the Euclidean algorithm totally with the Binary algorithm.
Two variants were provided and selected via Kconfig depending on whether
a fast __ffs (find least significant set bit) instruction is available.
For arm v5 and above the fast __ffs version is used as evident in
arch/arm/mm/Kconfig.
I benchmarked the gcd performance using the code provided in the commit
with a Cortex-A9 based Mediatek MT6577. Three runs at different settings
were used.
The performance with fast __ffs Binary algo is slower than the Euclidean
algo. Using the non ffs version [even/odd variant] gives a comparable
performance as the Euclidean algo.
Will be interesting to see whether this is also true for other platforms
with arm v5 and above? Hopefully others will do some testing.
If this is the case then we should "select CPU_NO_EFFICIENT_FFS" in our
Kconfig.
Thanks.
Best Regards,
Cheah
cross compiled with '-O2'
Euclidean Binary with ffs Binary no ffs
gcd -r 50000 -n 10
gcd0: elapsed 25766 gcd0: elapsed 25766 gcd0: elapsed 25765
gcd1: elapsed 19994 gcd1: elapsed 20224 gcd1: elapsed 19843
gcd2: elapsed 20071 gcd2: elapsed 20533 gcd2: elapsed 20151
gcd3: elapsed 20070 gcd3: elapsed 20380 gcd3: elapsed 19919
gcd4: elapsed 20148 gcd4: elapsed 20610 gcd4: elapsed 20151
PASS PASS PASS
gcd0: elapsed 26690 gcd0: elapsed 26612 gcd0: elapsed 24381
gcd1: elapsed 20224 gcd1: elapsed 20379 gcd1: elapsed 19765
gcd2: elapsed 20224 gcd2: elapsed 20304 gcd2: elapsed 19842
gcd3: elapsed 20148 gcd3: elapsed 20302 gcd3: elapsed 19919
gcd4: elapsed 20301 gcd4: elapsed 20302 gcd4: elapsed 19919
PASS PASS PASS
gcd0: elapsed 25842 gcd0: elapsed 26459 gcd0: elapsed 25457
gcd1: elapsed 20454 gcd1: elapsed 20532 gcd1: elapsed 20225
gcd2: elapsed 20378 gcd2: elapsed 20762 gcd2: elapsed 20226
gcd3: elapsed 20378 gcd3: elapsed 20378 gcd3: elapsed 20148
gcd4: elapsed 20532 gcd4: elapsed 20918 gcd4: elapsed 20301
PASS PASS PASS
gcd -r 1000 -n 100
gcd0: elapsed 245873 gcd0: elapsed 252957 gcd0: elapsed 245571
gcd1: elapsed 191290 gcd1: elapsed 198345 gcd1: elapsed 192513
gcd2: elapsed 192672 gcd2: elapsed 199579 gcd2: elapsed 192978
gcd3: elapsed 191366 gcd3: elapsed 198728 gcd3: elapsed 192283
gcd4: elapsed 193134 gcd4: elapsed 200884 gcd4: elapsed 193669
PASS PASS PASS
gcd0: elapsed 245180 gcd0: elapsed 251113 gcd0: elapsed 250573
gcd1: elapsed 191755 gcd1: elapsed 196800 gcd1: elapsed 194729
gcd2: elapsed 192286 gcd2: elapsed 198654 gcd2: elapsed 195574
gcd3: elapsed 191601 gcd3: elapsed 197344 gcd3: elapsed 194965
gcd4: elapsed 193135 gcd4: elapsed 200268 gcd4: elapsed 197037
PASS PASS PASS
gcd0: elapsed 243412 gcd0: elapsed 252189 gcd0: elapsed 247876
gcd1: elapsed 190447 gcd1: elapsed 197192 gcd1: elapsed 193355
gcd2: elapsed 192288 gcd2: elapsed 199042 gcd2: elapsed 193437
gcd3: elapsed 190755 gcd3: elapsed 198957 gcd3: elapsed 193660
gcd4: elapsed 192672 gcd4: elapsed 200346 gcd4: elapsed 194586
PASS PASS PASS
gcd -n 1000
gcd0: elapsed 2636655 gcd0: elapsed 2701340 gcd0: elapsed 2622109
gcd1: elapsed 2055411 gcd1: elapsed 2153446 gcd1: elapsed 2053342
gcd2: elapsed 2064420 gcd2: elapsed 2162496 gcd2: elapsed 2066503
gcd3: elapsed 2055151 gcd3: elapsed 2163201 gcd3: elapsed 2055161
gcd4: elapsed 2071591 gcd4: elapsed 2171636 gcd4: elapsed 2074488
PASS PASS PASS
gcd0: elapsed 2636512 gcd0: elapsed 2719436 gcd0: elapsed 2613575
gcd1: elapsed 2060157 gcd1: elapsed 2159284 gcd1: elapsed 2046187
gcd2: elapsed 2069242 gcd2: elapsed 2163944 gcd2: elapsed 2056430
gcd3: elapsed 2060436 gcd3: elapsed 2166796 gcd3: elapsed 2046933
gcd4: elapsed 2074188 gcd4: elapsed 2176243 gcd4: elapsed 2065170
PASS PASS PASS
gcd0: elapsed 2614949 gcd0: elapsed 2708342 gcd0: elapsed 2632962
gcd1: elapsed 2044957 gcd1: elapsed 2157985 gcd1: elapsed 2055475
gcd2: elapsed 2054496 gcd2: elapsed 2170720 gcd2: elapsed 2068926
gcd3: elapsed 2044838 gcd3: elapsed 2167954 gcd3: elapsed 2055305
gcd4: elapsed 2059033 gcd4: elapsed 2176002 gcd4: elapsed 2079856
PASS PASS PASS
More information about the linux-arm-kernel
mailing list