[LEDE-DEV] [PATCH] openssl: Enable assembler optimizations for aarch64

Sat Oct 28 01:05:20 PDT 2017

The awesome AES performance was too good to be true: it seems to produce
incorrect results when encrypting on the pine64 and decrypting on a x86_64
machine :(
Possibly some assembler is optimized away by the compiler, which would
explain why it's so fast.  Please don't merge for now until I investigate.

SHA does seem to give correct results though (and is really fast).

On 27-10-17, Baptiste Jonglez wrote:
> OpenSSL is built with the generic linux settings for most targets,
> including aarch64.  These generic settings are designed for 32-bit CPU and
> provide no assembler optmization: this is widely suboptimal for aarch64.
> 
> This patch simply switches to the aarch64 settings that are already
> available in OpenSSL.
> 
> Here is the output of "openssl speed" before the optimization, with
> "(...)" representing build flags that didn't change:
> 
>     OpenSSL 1.0.2l  25 May 2017
>     options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,int) aes(partial) blowfish(ptr)
>     compiler: aarch64-openwrt-linux-musl-gcc  (...)
> 
> And after this patch, OpenSSL uses 64 bit mode and assembler optimizations:
> 
>     OpenSSL 1.0.2l  25 May 2017
>     options:bn(64,64) rc4(ptr,char) des(idx,cisc,2,int) aes(partial) blowfish(ptr)
>     compiler: aarch64-openwrt-linux-musl-gcc  (...)  -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM
> 
> Here are some benchmarks on a pine64+ running latest LEDE master r5142-20d363aed3:
> 
>     before# openssl speed sha aes blowfish
>     The 'numbers' are in 1000s of bytes per second processed.
>     type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
>     sha1              3918.89k     9982.43k    19148.03k    24933.03k    27325.78k
>     sha256            4604.51k    10240.64k    17472.51k    21355.18k    22801.07k
>     sha512            3662.19k    14539.41k    21443.16k    29544.11k    33177.60k
>     blowfish cbc     16266.63k    16940.86k    17176.92k    17237.33k    17252.35k
>     aes-128 cbc      19712.95k    21447.40k    22091.09k    22258.35k    22304.09k
>     aes-192 cbc      17680.12k    19064.47k    19572.14k    19703.13k    19737.26k
>     aes-256 cbc      15986.67k    17132.48k    17537.28k    17657.17k    17689.26k
> 
>     after# openssl speed sha aes blowfish
>     type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
>     sha1              6770.87k    26172.80k    86878.38k   205649.58k   345978.20k
>     sha256           20913.93k    74663.85k   184658.18k   290891.09k   351032.66k
>     sha512            7633.10k    30110.14k    50083.24k    71883.43k    82485.25k
>     blowfish cbc     16224.93k    16933.55k    17173.76k    17234.94k    17252.35k
>     aes-128 cbc      19425.74k    21193.31k    22065.74k    22304.77k    22380.54k
>     aes-192 cbc      17452.29k    18883.84k    19536.90k    19741.70k    19800.06k
>     aes-256 cbc      15815.89k    17003.01k    17530.03k    17695.40k    17746.60k
> 
> For some reason AES and blowfish do not benefit, but SHA performance
> improves between 1.7x and 15x.  SHA256 clearly benefits the most from the
> optimization (4.5x on small blocks, 15x on large blocks!).
> 
> When using EVP (with "openssl speed -evp <algo>"):
> 
>     # Before, EVP mode
>     type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
>     sha1              3824.46k    10049.66k    19170.56k    24947.03k    27325.78k
>     sha256            3368.33k     8511.15k    16061.44k    20772.52k    22721.88k
>     sha512            2845.23k    11381.57k    19467.69k    28512.26k    33008.30k
>     bf-cbc           15146.74k    16623.83k    17092.01k    17211.39k    17249.62k
>     aes-128-cbc      17873.03k    20870.61k    21933.65k    22216.36k    22301.35k
>     aes-192-cbc      16184.18k    18607.15k    19447.13k    19670.02k    19737.26k
>     aes-256-cbc      14774.06k    16757.25k    17457.58k    17639.42k    17686.53k
> 
>     # After, EVP mode
>     type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
>     sha1              7056.97k    27142.10k    89515.86k   209155.41k   347419.99k
>     sha256            7745.70k    29750.06k    95341.48k   211001.69k   332376.75k
>     sha512            4550.47k    18086.06k    39997.10k    65880.75k    81431.21k
>     bf-cbc           15129.20k    16619.03k    17090.56k    17212.76k    17246.89k
>     aes-128-cbc      99619.74k   269032.34k   450214.23k   567353.00k   613933.06k
>     aes-192-cbc      93180.74k   231017.79k   361766.66k   433671.51k   461731.16k
>     aes-256-cbc      89343.23k   209858.58k   310160.04k   362234.88k   380878.85k
> 
> Blowfish does not seem to have assembler optimization at all, and SHA
> still benefits (between 1.6x and 14.5x) but is generally slower than in
> non-EVP mode.
> 
> However, AES performance is improved between 5.5x and 27.5x, which is
> really impressive!  For aes-128-cbc on large blocks, a core i7-6600U
> @2.60GHz is only twice as fast...
> 
> Signed-off-by: Baptiste Jonglez <git at bitsofnetworks.org>
> ---
>  package/libs/openssl/Makefile                            | 4 +++-
>  package/libs/openssl/patches/110-optimize-for-size.patch | 3 ++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/package/libs/openssl/Makefile b/package/libs/openssl/Makefile
> index 7707c19431..d7037cb7c1 100644
> --- a/package/libs/openssl/Makefile
> +++ b/package/libs/openssl/Makefile
> @@ -11,7 +11,7 @@ PKG_NAME:=openssl
>  PKG_BASE:=1.0.2
>  PKG_BUGFIX:=l
>  PKG_VERSION:=$(PKG_BASE)$(PKG_BUGFIX)
> -PKG_RELEASE:=1
> +PKG_RELEASE:=2
>  PKG_USE_MIPS16:=0
>  
>  PKG_BUILD_PARALLEL:=0
> @@ -161,6 +161,8 @@ else
>    OPENSSL_OPTIONS+=no-sse2
>    ifeq ($(CONFIG_mips)$(CONFIG_mipsel),y)
>      OPENSSL_TARGET:=linux-mips-openwrt
> +  else ifeq ($(CONFIG_aarch64),y)
> +    OPENSSL_TARGET:=linux-aarch64-openwrt
>    else ifeq ($(CONFIG_arm)$(CONFIG_armeb),y)
>      OPENSSL_TARGET:=linux-armv4-openwrt
>    else
> diff --git a/package/libs/openssl/patches/110-optimize-for-size.patch b/package/libs/openssl/patches/110-optimize-for-size.patch
> index 0f174a3469..d6d4a21111 100644
> --- a/package/libs/openssl/patches/110-optimize-for-size.patch
> +++ b/package/libs/openssl/patches/110-optimize-for-size.patch
> @@ -1,11 +1,12 @@
>  --- a/Configure
>  +++ b/Configure
> -@@ -470,6 +470,12 @@ my %table=(
> +@@ -470,6 +470,13 @@ my %table=(
>   "linux-alpha-ccc","ccc:-fast -readonly_strings -DL_ENDIAN::-D_REENTRANT:::SIXTY_FOUR_BIT_LONG RC4_CHUNK DES_INT DES_PTR DES_RISC1 DES_UNROLL:${alpha_asm}",
>   "linux-alpha+bwx-ccc","ccc:-fast -readonly_strings -DL_ENDIAN::-D_REENTRANT:::SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT DES_PTR DES_RISC1 DES_UNROLL:${alpha_asm}",
>   
>  +# OpenWrt targets
>  +"linux-armv4-openwrt","gcc:-DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${armv4_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
> ++"linux-aarch64-openwrt","gcc:-DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${aarch64_asm}:linux64:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
>  +"linux-x86_64-openwrt",	"gcc:-m64 -DL_ENDIAN -DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHUNK DES_INT DES_UNROLL:${x86_64_asm}:elf:dlfcn:linux-shared:-fPIC:-m64:.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR):::64",
>  +"linux-mips-openwrt","gcc:-DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${mips32_asm}:o32:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
>  +"linux-generic-openwrt","gcc:-DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/lede-dev/attachments/20171028/476f8f3b/attachment.sig>