ILP32 for ARM64 - testing with lmbench

Zhangjian (Bamvor) bamvor.zhangjian at huawei.com
Wed Nov 16 23:48:04 PST 2016


Hi, Maxim

On 2016/11/17 13:02, Maxim Kuvyrkov wrote:
> Hi Bamvor,
>
> I'm surprised that you see this much difference from ILP32 patches on SPEC CPU2006int at all.  The SPEC CPU2006 benchmarks spend almost no time in the kernel syscalls.  I can imagine memory, TLB, and cache handling in the kernel could affect CPU2006 benchmarks.  Do ILP32 patches touch code in those areas?
>
> Other than that, it would be interesting to check what the variance is between the 3 iterations of benchmark runs.  Could you check what relative standard deviation is between the 3 iterations -- (STDEV(RUN1, RUN2, RUN3) / RUNselected)?
>
> For reference, in my [non-ILP32] benchmarking I see 1.1% for 401.bzip2,  0.8% for 429.mcf, 0.2% for 456.hmmer, and 0.1% for 462.libquantum.
Here is my result:
                     ILP32_merged    ILP32_unmerged
       401.bzip2            0.31%            0.26%
       429.mcf              1.61%            1.36%
       456.hmmer            1.37%            1.57%
       462.libquantum       0.29%            0.28%

Regards

Bamvor

>
> --
> Maxim Kuvyrkov
> www.linaro.org
>
>
>
>> On Nov 17, 2016, at 7:28 AM, Zhangjian (Bamvor) <bamvor.zhangjian at huawei.com> wrote:
>>
>> Hi, all
>>
>> I test specint of aarch64 LP64 when aarch32 el0 disable/enabled respectively
>> and compare with ILP32 unmerged kernel(4.8-rc6) in our arm64 board. I found
>> that difference(ILP32 disabled/ILP32 unmerged) is bigger when aarch32 el0 is
>> enabled, compare with aarch32 el0 disabled kernel. And bzip2, mcg, hmmer,
>> libquantum are the top four differences[1]. Note that bigger is better in
>> specint test.
>>
>> In order to make sure the above results, I retest these four testcases in
>> reportable way(reference the command in the end). The result[2] show that
>> libquantum decrease -2.09% after ILP32 enabled and aarch32 on. I think it is in
>> significant.
>>
>> The result of lmbench is not stable in my board. I plan to dig it later.
>>
>> [1] The following test result is tested through --size=ref --iterations=3.
>> 1.1 Test when aarch32_el0 is enabled.
>>                        ILP32 disabled        base line
>>      400.perlbench            100.00%             100%
>>      401.bzip2                 99.35%             100%
>>      403.gcc                  100.26%             100%
>>      429.mcf                  102.75%             100%
>>      445.gobmk                100.00%             100%
>>      456.hmmer                 95.66%             100%
>>      458.sjeng                100.00%             100%
>>      462.libquantum           100.00%             100%
>>      471.omnetpp              100.59%             100%
>>      473.astar                 99.66%             100%
>>      483.xalancbmk             99.10%             100%
>>
>> 1.2 Test when aarch32_el0 is disabled
>>                        ILP32 disabled         base line
>>      400.perlbench            100.22%              100%
>>      401.bzip2                100.95%              100%
>>      403.gcc                  100.20%              100%
>>      429.mcf                  100.76%              100%
>>      445.gobmk                100.36%              100%
>>      456.hmmer                 97.94%              100%
>>      458.sjeng                 99.73%              100%
>>      462.libquantum            98.72%              100%
>>      471.omnetpp              100.86%              100%
>>      473.astar                 99.15%              100%
>>      483.xalancbmk            100.08%              100%
>>
>> [2] The following test result is tested through: runspec --config=my.cfg --size=test,train,ref --noreportable --tune=base,peak --iterations=3 bzip2 mcf hmmer libquantum
>> 2.1 Test when aarch32_el0 is enabled.
>>                         ILP32_enabled         base line
>>      401.bzip2                100.82%              100%
>>      429.mcf                  100.18%              100%
>>      456.hmmer                 99.64%              100%
>>      462.libquantum            97.91%              100%
>>
>> Regards
>>
>> Bamvor
>>
>> On 2016/10/28 20:46, Yury Norov wrote:
>>> [Add Steve Ellcey, thanks for testing on ThunderX]
>>>
>>> Lmbench-3.0-a9 testing is performed on ThunderX machine to check that
>>> ILP32 series does not add performance regressions for LP64. Test
>>> summary is in the table below. Our measurements doesn't show
>>> significant performance regression of LP64 if ILP32 code is merged,
>>> both enabled or disabled.
>>>
>>>               ILP32 enabled   ILP32  disabled   Standard Kernel
>>> null syscall   0.1066          0.1121            0.1121
>>>               95.09%          100.00%
>>>
>>> stat           1.3947          1.3814            1.3864
>>>               100.60%         99.64%
>>>
>>> fstat          0.4459          0.4344            0.4524
>>>               98.56%          96.02%
>>>
>>> open/close     4.0606          4.0411            4.0453
>>>               100.38%         99.90%
>>>
>>> read           0.4819          0.5014            0.5014
>>>               96.11%          100.00%
>>>
>>> Tested with linux 4.8 because 4.9-rc1 is not fixed yet for ThunderX.
>>> Other system details below.
>>>
>>> Yury.
>>>
>>> ubuntu at crb6:~$ uname -a
>>> Linux crb6 4.8.0+ #3 SMP Thu Oct 27 11:01:32 PDT 2016 aarch64 aarch64 aarch64 GNU/Linux
>>>
>>> ubuntu at crb6:~$ cat /proc/meminfo
>>> MemTotal:       132011948 kB
>>> MemFree:        131442672 kB
>>> MemAvailable:   130695764 kB
>>> Buffers:           15696 kB
>>> Cached:            88088 kB
>>> SwapCached:            0 kB
>>> Active:            82760 kB
>>> Inactive:          41336 kB
>>> Active(anon):      20880 kB
>>> Inactive(anon):     8576 kB
>>> Active(file):      61880 kB
>>> Inactive(file):    32760 kB
>>> Unevictable:           0 kB
>>> Mlocked:               0 kB
>>> SwapTotal:      128920572 kB
>>> SwapFree:       128920572 kB
>>> Dirty:                 0 kB
>>> Writeback:             0 kB
>>> AnonPages:         20544 kB
>>> Mapped:            19780 kB
>>> Shmem:              9060 kB
>>> Slab:              78804 kB
>>> SReclaimable:      27372 kB
>>> SUnreclaim:        51432 kB
>>> KernelStack:        8336 kB
>>> PageTables:          820 kB
>>> NFS_Unstable:          0 kB
>>> Bounce:                0 kB
>>> WritebackTmp:          0 kB
>>> CommitLimit:    194926544 kB
>>> Committed_AS:     256324 kB
>>> VmallocTotal:   135290290112 kB
>>> VmallocUsed:           0 kB
>>> VmallocChunk:          0 kB
>>> AnonHugePages:         0 kB
>>> ShmemHugePages:        0 kB
>>> ShmemPmdMapped:        0 kB
>>> CmaTotal:              0 kB
>>> CmaFree:               0 kB
>>> HugePages_Total:       0
>>> HugePages_Free:        0
>>> HugePages_Rsvd:        0
>>> HugePages_Surp:        0
>>> Hugepagesize:       2048 kB
>>>
>>> ubuntu at crb6:~$ cat /proc/cpuinfo
>>> processor	: 0
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 1
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 2
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 3
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 4
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 5
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 6
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 7
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 8
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 9
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 10
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 11
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 12
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 13
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 14
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 15
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 16
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 17
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 18
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 19
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 20
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 21
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 22
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 23
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 24
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 25
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 26
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 27
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 28
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 29
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 30
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 31
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 32
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 33
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 34
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 35
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 36
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 37
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 38
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 39
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 40
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 41
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 42
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 43
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 44
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 45
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 46
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>> processor	: 47
>>> BogoMIPS	: 200.00
>>> Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
>>> CPU implementer	: 0x43
>>> CPU architecture: 8
>>> CPU variant	: 0x1
>>> CPU part	: 0x0a1
>>> CPU revision	: 0
>>>
>>
>




More information about the linux-arm-kernel mailing list