[LEDE-DEV] [PATCH] openssl: Enable assembler optimizations for aarch64
Baptiste Jonglez
baptiste at bitsofnetworks.org
Mon Oct 30 12:16:16 PDT 2017
On 28-10-17, Baptiste Jonglez wrote:
> The awesome AES performance was too good to be true: it seems to produce
> incorrect results when encrypting on the pine64 and decrypting on a x86_64
> machine :(
> Possibly some assembler is optimized away by the compiler, which would
> explain why it's so fast. Please don't merge for now until I investigate.
After investigating, there is actually no issue, so this is good to merge!
For the details:
- I was using openssl 1.1 for encrypting and openssl 1.0 for decrypting,
so I was bitten by https://www.openssl.org/docs/faq.html#USER3 .
Using the same digest algorithm on both sides yields correct results.
- AES performance is so good because openssl exploits the dedicated
hardware instructions for AES found in most Aarch64 CPUs. Support for
this was introduced 3 years ago:
https://github.com/openssl/openssl/commit/9af4cb3d3beaaed8af33ee0bbc547cfef49c88a6
Baptiste
> On 27-10-17, Baptiste Jonglez wrote:
> > OpenSSL is built with the generic linux settings for most targets,
> > including aarch64. These generic settings are designed for 32-bit CPU and
> > provide no assembler optmization: this is widely suboptimal for aarch64.
> >
> > This patch simply switches to the aarch64 settings that are already
> > available in OpenSSL.
> >
> > Here is the output of "openssl speed" before the optimization, with
> > "(...)" representing build flags that didn't change:
> >
> > OpenSSL 1.0.2l 25 May 2017
> > options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,int) aes(partial) blowfish(ptr)
> > compiler: aarch64-openwrt-linux-musl-gcc (...)
> >
> > And after this patch, OpenSSL uses 64 bit mode and assembler optimizations:
> >
> > OpenSSL 1.0.2l 25 May 2017
> > options:bn(64,64) rc4(ptr,char) des(idx,cisc,2,int) aes(partial) blowfish(ptr)
> > compiler: aarch64-openwrt-linux-musl-gcc (...) -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM
> >
> > Here are some benchmarks on a pine64+ running latest LEDE master r5142-20d363aed3:
> >
> > before# openssl speed sha aes blowfish
> > The 'numbers' are in 1000s of bytes per second processed.
> > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
> > sha1 3918.89k 9982.43k 19148.03k 24933.03k 27325.78k
> > sha256 4604.51k 10240.64k 17472.51k 21355.18k 22801.07k
> > sha512 3662.19k 14539.41k 21443.16k 29544.11k 33177.60k
> > blowfish cbc 16266.63k 16940.86k 17176.92k 17237.33k 17252.35k
> > aes-128 cbc 19712.95k 21447.40k 22091.09k 22258.35k 22304.09k
> > aes-192 cbc 17680.12k 19064.47k 19572.14k 19703.13k 19737.26k
> > aes-256 cbc 15986.67k 17132.48k 17537.28k 17657.17k 17689.26k
> >
> > after# openssl speed sha aes blowfish
> > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
> > sha1 6770.87k 26172.80k 86878.38k 205649.58k 345978.20k
> > sha256 20913.93k 74663.85k 184658.18k 290891.09k 351032.66k
> > sha512 7633.10k 30110.14k 50083.24k 71883.43k 82485.25k
> > blowfish cbc 16224.93k 16933.55k 17173.76k 17234.94k 17252.35k
> > aes-128 cbc 19425.74k 21193.31k 22065.74k 22304.77k 22380.54k
> > aes-192 cbc 17452.29k 18883.84k 19536.90k 19741.70k 19800.06k
> > aes-256 cbc 15815.89k 17003.01k 17530.03k 17695.40k 17746.60k
> >
> > For some reason AES and blowfish do not benefit, but SHA performance
> > improves between 1.7x and 15x. SHA256 clearly benefits the most from the
> > optimization (4.5x on small blocks, 15x on large blocks!).
> >
> > When using EVP (with "openssl speed -evp <algo>"):
> >
> > # Before, EVP mode
> > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
> > sha1 3824.46k 10049.66k 19170.56k 24947.03k 27325.78k
> > sha256 3368.33k 8511.15k 16061.44k 20772.52k 22721.88k
> > sha512 2845.23k 11381.57k 19467.69k 28512.26k 33008.30k
> > bf-cbc 15146.74k 16623.83k 17092.01k 17211.39k 17249.62k
> > aes-128-cbc 17873.03k 20870.61k 21933.65k 22216.36k 22301.35k
> > aes-192-cbc 16184.18k 18607.15k 19447.13k 19670.02k 19737.26k
> > aes-256-cbc 14774.06k 16757.25k 17457.58k 17639.42k 17686.53k
> >
> > # After, EVP mode
> > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
> > sha1 7056.97k 27142.10k 89515.86k 209155.41k 347419.99k
> > sha256 7745.70k 29750.06k 95341.48k 211001.69k 332376.75k
> > sha512 4550.47k 18086.06k 39997.10k 65880.75k 81431.21k
> > bf-cbc 15129.20k 16619.03k 17090.56k 17212.76k 17246.89k
> > aes-128-cbc 99619.74k 269032.34k 450214.23k 567353.00k 613933.06k
> > aes-192-cbc 93180.74k 231017.79k 361766.66k 433671.51k 461731.16k
> > aes-256-cbc 89343.23k 209858.58k 310160.04k 362234.88k 380878.85k
> >
> > Blowfish does not seem to have assembler optimization at all, and SHA
> > still benefits (between 1.6x and 14.5x) but is generally slower than in
> > non-EVP mode.
> >
> > However, AES performance is improved between 5.5x and 27.5x, which is
> > really impressive! For aes-128-cbc on large blocks, a core i7-6600U
> > @2.60GHz is only twice as fast...
> >
> > Signed-off-by: Baptiste Jonglez <git at bitsofnetworks.org>
> > ---
> > package/libs/openssl/Makefile | 4 +++-
> > package/libs/openssl/patches/110-optimize-for-size.patch | 3 ++-
> > 2 files changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/package/libs/openssl/Makefile b/package/libs/openssl/Makefile
> > index 7707c19431..d7037cb7c1 100644
> > --- a/package/libs/openssl/Makefile
> > +++ b/package/libs/openssl/Makefile
> > @@ -11,7 +11,7 @@ PKG_NAME:=openssl
> > PKG_BASE:=1.0.2
> > PKG_BUGFIX:=l
> > PKG_VERSION:=$(PKG_BASE)$(PKG_BUGFIX)
> > -PKG_RELEASE:=1
> > +PKG_RELEASE:=2
> > PKG_USE_MIPS16:=0
> >
> > PKG_BUILD_PARALLEL:=0
> > @@ -161,6 +161,8 @@ else
> > OPENSSL_OPTIONS+=no-sse2
> > ifeq ($(CONFIG_mips)$(CONFIG_mipsel),y)
> > OPENSSL_TARGET:=linux-mips-openwrt
> > + else ifeq ($(CONFIG_aarch64),y)
> > + OPENSSL_TARGET:=linux-aarch64-openwrt
> > else ifeq ($(CONFIG_arm)$(CONFIG_armeb),y)
> > OPENSSL_TARGET:=linux-armv4-openwrt
> > else
> > diff --git a/package/libs/openssl/patches/110-optimize-for-size.patch b/package/libs/openssl/patches/110-optimize-for-size.patch
> > index 0f174a3469..d6d4a21111 100644
> > --- a/package/libs/openssl/patches/110-optimize-for-size.patch
> > +++ b/package/libs/openssl/patches/110-optimize-for-size.patch
> > @@ -1,11 +1,12 @@
> > --- a/Configure
> > +++ b/Configure
> > -@@ -470,6 +470,12 @@ my %table=(
> > +@@ -470,6 +470,13 @@ my %table=(
> > "linux-alpha-ccc","ccc:-fast -readonly_strings -DL_ENDIAN::-D_REENTRANT:::SIXTY_FOUR_BIT_LONG RC4_CHUNK DES_INT DES_PTR DES_RISC1 DES_UNROLL:${alpha_asm}",
> > "linux-alpha+bwx-ccc","ccc:-fast -readonly_strings -DL_ENDIAN::-D_REENTRANT:::SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT DES_PTR DES_RISC1 DES_UNROLL:${alpha_asm}",
> >
> > +# OpenWrt targets
> > +"linux-armv4-openwrt","gcc:-DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${armv4_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
> > ++"linux-aarch64-openwrt","gcc:-DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${aarch64_asm}:linux64:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
> > +"linux-x86_64-openwrt", "gcc:-m64 -DL_ENDIAN -DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHUNK DES_INT DES_UNROLL:${x86_64_asm}:elf:dlfcn:linux-shared:-fPIC:-m64:.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR):::64",
> > +"linux-mips-openwrt","gcc:-DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${mips32_asm}:o32:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
> > +"linux-generic-openwrt","gcc:-DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
> _______________________________________________
> Lede-dev mailing list
> Lede-dev at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/lede-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/lede-dev/attachments/20171030/41f67adf/attachment.sig>
More information about the Lede-dev
mailing list