[LEDE-DEV] [PATCH] openssl: Enable assembler optimizations for aarch64

Baptiste Jonglez baptiste at bitsofnetworks.org
Mon Oct 30 12:16:16 PDT 2017


On 28-10-17, Baptiste Jonglez wrote:
> The awesome AES performance was too good to be true: it seems to produce
> incorrect results when encrypting on the pine64 and decrypting on a x86_64
> machine :(
> Possibly some assembler is optimized away by the compiler, which would
> explain why it's so fast.  Please don't merge for now until I investigate.

After investigating, there is actually no issue, so this is good to merge!

For the details:

- I was using openssl 1.1 for encrypting and openssl 1.0 for decrypting,
  so I was bitten by https://www.openssl.org/docs/faq.html#USER3 .
  Using the same digest algorithm on both sides yields correct results.

- AES performance is so good because openssl exploits the dedicated
  hardware instructions for AES found in most Aarch64 CPUs.  Support for
  this was introduced 3 years ago:
  https://github.com/openssl/openssl/commit/9af4cb3d3beaaed8af33ee0bbc547cfef49c88a6

Baptiste

> On 27-10-17, Baptiste Jonglez wrote:
> > OpenSSL is built with the generic linux settings for most targets,
> > including aarch64.  These generic settings are designed for 32-bit CPU and
> > provide no assembler optmization: this is widely suboptimal for aarch64.
> > 
> > This patch simply switches to the aarch64 settings that are already
> > available in OpenSSL.
> > 
> > Here is the output of "openssl speed" before the optimization, with
> > "(...)" representing build flags that didn't change:
> > 
> >     OpenSSL 1.0.2l  25 May 2017
> >     options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,int) aes(partial) blowfish(ptr)
> >     compiler: aarch64-openwrt-linux-musl-gcc  (...)
> > 
> > And after this patch, OpenSSL uses 64 bit mode and assembler optimizations:
> > 
> >     OpenSSL 1.0.2l  25 May 2017
> >     options:bn(64,64) rc4(ptr,char) des(idx,cisc,2,int) aes(partial) blowfish(ptr)
> >     compiler: aarch64-openwrt-linux-musl-gcc  (...)  -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM
> > 
> > Here are some benchmarks on a pine64+ running latest LEDE master r5142-20d363aed3:
> > 
> >     before# openssl speed sha aes blowfish
> >     The 'numbers' are in 1000s of bytes per second processed.
> >     type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
> >     sha1              3918.89k     9982.43k    19148.03k    24933.03k    27325.78k
> >     sha256            4604.51k    10240.64k    17472.51k    21355.18k    22801.07k
> >     sha512            3662.19k    14539.41k    21443.16k    29544.11k    33177.60k
> >     blowfish cbc     16266.63k    16940.86k    17176.92k    17237.33k    17252.35k
> >     aes-128 cbc      19712.95k    21447.40k    22091.09k    22258.35k    22304.09k
> >     aes-192 cbc      17680.12k    19064.47k    19572.14k    19703.13k    19737.26k
> >     aes-256 cbc      15986.67k    17132.48k    17537.28k    17657.17k    17689.26k
> > 
> >     after# openssl speed sha aes blowfish
> >     type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
> >     sha1              6770.87k    26172.80k    86878.38k   205649.58k   345978.20k
> >     sha256           20913.93k    74663.85k   184658.18k   290891.09k   351032.66k
> >     sha512            7633.10k    30110.14k    50083.24k    71883.43k    82485.25k
> >     blowfish cbc     16224.93k    16933.55k    17173.76k    17234.94k    17252.35k
> >     aes-128 cbc      19425.74k    21193.31k    22065.74k    22304.77k    22380.54k
> >     aes-192 cbc      17452.29k    18883.84k    19536.90k    19741.70k    19800.06k
> >     aes-256 cbc      15815.89k    17003.01k    17530.03k    17695.40k    17746.60k
> > 
> > For some reason AES and blowfish do not benefit, but SHA performance
> > improves between 1.7x and 15x.  SHA256 clearly benefits the most from the
> > optimization (4.5x on small blocks, 15x on large blocks!).
> > 
> > When using EVP (with "openssl speed -evp <algo>"):
> > 
> >     # Before, EVP mode
> >     type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
> >     sha1              3824.46k    10049.66k    19170.56k    24947.03k    27325.78k
> >     sha256            3368.33k     8511.15k    16061.44k    20772.52k    22721.88k
> >     sha512            2845.23k    11381.57k    19467.69k    28512.26k    33008.30k
> >     bf-cbc           15146.74k    16623.83k    17092.01k    17211.39k    17249.62k
> >     aes-128-cbc      17873.03k    20870.61k    21933.65k    22216.36k    22301.35k
> >     aes-192-cbc      16184.18k    18607.15k    19447.13k    19670.02k    19737.26k
> >     aes-256-cbc      14774.06k    16757.25k    17457.58k    17639.42k    17686.53k
> > 
> >     # After, EVP mode
> >     type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
> >     sha1              7056.97k    27142.10k    89515.86k   209155.41k   347419.99k
> >     sha256            7745.70k    29750.06k    95341.48k   211001.69k   332376.75k
> >     sha512            4550.47k    18086.06k    39997.10k    65880.75k    81431.21k
> >     bf-cbc           15129.20k    16619.03k    17090.56k    17212.76k    17246.89k
> >     aes-128-cbc      99619.74k   269032.34k   450214.23k   567353.00k   613933.06k
> >     aes-192-cbc      93180.74k   231017.79k   361766.66k   433671.51k   461731.16k
> >     aes-256-cbc      89343.23k   209858.58k   310160.04k   362234.88k   380878.85k
> > 
> > Blowfish does not seem to have assembler optimization at all, and SHA
> > still benefits (between 1.6x and 14.5x) but is generally slower than in
> > non-EVP mode.
> > 
> > However, AES performance is improved between 5.5x and 27.5x, which is
> > really impressive!  For aes-128-cbc on large blocks, a core i7-6600U
> > @2.60GHz is only twice as fast...
> > 
> > Signed-off-by: Baptiste Jonglez <git at bitsofnetworks.org>
> > ---
> >  package/libs/openssl/Makefile                            | 4 +++-
> >  package/libs/openssl/patches/110-optimize-for-size.patch | 3 ++-
> >  2 files changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/package/libs/openssl/Makefile b/package/libs/openssl/Makefile
> > index 7707c19431..d7037cb7c1 100644
> > --- a/package/libs/openssl/Makefile
> > +++ b/package/libs/openssl/Makefile
> > @@ -11,7 +11,7 @@ PKG_NAME:=openssl
> >  PKG_BASE:=1.0.2
> >  PKG_BUGFIX:=l
> >  PKG_VERSION:=$(PKG_BASE)$(PKG_BUGFIX)
> > -PKG_RELEASE:=1
> > +PKG_RELEASE:=2
> >  PKG_USE_MIPS16:=0
> >  
> >  PKG_BUILD_PARALLEL:=0
> > @@ -161,6 +161,8 @@ else
> >    OPENSSL_OPTIONS+=no-sse2
> >    ifeq ($(CONFIG_mips)$(CONFIG_mipsel),y)
> >      OPENSSL_TARGET:=linux-mips-openwrt
> > +  else ifeq ($(CONFIG_aarch64),y)
> > +    OPENSSL_TARGET:=linux-aarch64-openwrt
> >    else ifeq ($(CONFIG_arm)$(CONFIG_armeb),y)
> >      OPENSSL_TARGET:=linux-armv4-openwrt
> >    else
> > diff --git a/package/libs/openssl/patches/110-optimize-for-size.patch b/package/libs/openssl/patches/110-optimize-for-size.patch
> > index 0f174a3469..d6d4a21111 100644
> > --- a/package/libs/openssl/patches/110-optimize-for-size.patch
> > +++ b/package/libs/openssl/patches/110-optimize-for-size.patch
> > @@ -1,11 +1,12 @@
> >  --- a/Configure
> >  +++ b/Configure
> > -@@ -470,6 +470,12 @@ my %table=(
> > +@@ -470,6 +470,13 @@ my %table=(
> >   "linux-alpha-ccc","ccc:-fast -readonly_strings -DL_ENDIAN::-D_REENTRANT:::SIXTY_FOUR_BIT_LONG RC4_CHUNK DES_INT DES_PTR DES_RISC1 DES_UNROLL:${alpha_asm}",
> >   "linux-alpha+bwx-ccc","ccc:-fast -readonly_strings -DL_ENDIAN::-D_REENTRANT:::SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT DES_PTR DES_RISC1 DES_UNROLL:${alpha_asm}",
> >   
> >  +# OpenWrt targets
> >  +"linux-armv4-openwrt","gcc:-DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${armv4_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
> > ++"linux-aarch64-openwrt","gcc:-DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${aarch64_asm}:linux64:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
> >  +"linux-x86_64-openwrt",	"gcc:-m64 -DL_ENDIAN -DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHUNK DES_INT DES_UNROLL:${x86_64_asm}:elf:dlfcn:linux-shared:-fPIC:-m64:.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR):::64",
> >  +"linux-mips-openwrt","gcc:-DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${mips32_asm}:o32:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
> >  +"linux-generic-openwrt","gcc:-DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",



> _______________________________________________
> Lede-dev mailing list
> Lede-dev at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/lede-dev

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/lede-dev/attachments/20171030/41f67adf/attachment.sig>


More information about the Lede-dev mailing list