[PATCH] arm64: add support for SHA256 using NEON instructions
Ard Biesheuvel
ard.biesheuvel at linaro.org
Thu Sep 29 15:51:42 PDT 2016
This is a port of the ARMv7 implementation in arch/arm/crypto. For a Cortex-A57
(r2p1), the performance numbers are listed below. In summary, 40% - 50% speedup
where it counts, i.e., block sizes over 256 bytes with few updates.
testing speed of async sha256 (sha256-generic)
( 16 byte blocks, 16 bytes x 1 updates): 1379992 ops/s, 22079872 Bps
( 64 byte blocks, 16 bytes x 4 updates): 633455 ops/s, 40541120 Bps
( 64 byte blocks, 64 bytes x 1 updates): 738076 ops/s, 47236864 Bps
( 256 byte blocks, 16 bytes x 16 updates): 234420 ops/s, 60011520 Bps
( 256 byte blocks, 64 bytes x 4 updates): 293008 ops/s, 75010048 Bps
( 256 byte blocks, 256 bytes x 1 updates): 309600 ops/s, 79257600 Bps
( 1024 byte blocks, 16 bytes x 64 updates): 66997 ops/s, 68604928 Bps
( 1024 byte blocks, 256 bytes x 4 updates): 91912 ops/s, 94117888 Bps
( 1024 byte blocks, 1024 bytes x 1 updates): 93992 ops/s, 96247808 Bps
( 2048 byte blocks, 16 bytes x 128 updates): 34385 ops/s, 70420480 Bps
( 2048 byte blocks, 256 bytes x 8 updates): 47570 ops/s, 97423360 Bps
( 2048 byte blocks, 1024 bytes x 2 updates): 48557 ops/s, 99444736 Bps
( 2048 byte blocks, 2048 bytes x 1 updates): 48781 ops/s, 99903488 Bps
( 4096 byte blocks, 16 bytes x 256 updates): 17401 ops/s, 71274496 Bps
( 4096 byte blocks, 256 bytes x 16 updates): 24211 ops/s, 99168256 Bps
( 4096 byte blocks, 1024 bytes x 4 updates): 24720 ops/s, 101253120 Bps
( 4096 byte blocks, 4096 bytes x 1 updates): 24930 ops/s, 102113280 Bps
( 8192 byte blocks, 16 bytes x 512 updates): 8738 ops/s, 71581696 Bps
( 8192 byte blocks, 256 bytes x 32 updates): 12214 ops/s, 100057088 Bps
( 8192 byte blocks, 1024 bytes x 8 updates): 12474 ops/s, 102187008 Bps
( 8192 byte blocks, 4096 bytes x 2 updates): 12558 ops/s, 102875136 Bps
( 8192 byte blocks, 8192 bytes x 1 updates): 12555 ops/s, 102850560 Bps
testing speed of async sha256 (sha256-neon)
( 16 byte blocks, 16 bytes x 1 updates): 1802881 ops/s, 28846096 Bps
( 64 byte blocks, 16 bytes x 4 updates): 744861 ops/s, 47671104 Bps
( 64 byte blocks, 64 bytes x 1 updates): 1015413 ops/s, 64986432 Bps
( 256 byte blocks, 16 bytes x 16 updates): 281055 ops/s, 71950080 Bps
( 256 byte blocks, 64 bytes x 4 updates): 378437 ops/s, 96879872 Bps
( 256 byte blocks, 256 bytes x 1 updates): 453325 ops/s, 116051200 Bps
( 1024 byte blocks, 16 bytes x 64 updates): 79809 ops/s, 81724416 Bps
( 1024 byte blocks, 256 bytes x 4 updates): 131621 ops/s, 134779904 Bps
( 1024 byte blocks, 1024 bytes x 1 updates): 140708 ops/s, 144084992 Bps
( 2048 byte blocks, 16 bytes x 128 updates): 40900 ops/s, 83763200 Bps
( 2048 byte blocks, 256 bytes x 8 updates): 68348 ops/s, 139976704 Bps
( 2048 byte blocks, 1024 bytes x 2 updates): 72051 ops/s, 147560448 Bps
( 2048 byte blocks, 2048 bytes x 1 updates): 73358 ops/s, 150237184 Bps
( 4096 byte blocks, 16 bytes x 256 updates): 20746 ops/s, 84975616 Bps
( 4096 byte blocks, 256 bytes x 16 updates): 34842 ops/s, 142712832 Bps
( 4096 byte blocks, 1024 bytes x 4 updates): 36794 ops/s, 150708224 Bps
( 4096 byte blocks, 4096 bytes x 1 updates): 37422 ops/s, 153280512 Bps
( 8192 byte blocks, 16 bytes x 512 updates): 10428 ops/s, 85426176 Bps
( 8192 byte blocks, 256 bytes x 32 updates): 17600 ops/s, 144179200 Bps
( 8192 byte blocks, 1024 bytes x 8 updates): 18594 ops/s, 152322048 Bps
( 8192 byte blocks, 4096 bytes x 2 updates): 18858 ops/s, 154484736 Bps
( 8192 byte blocks, 8192 bytes x 1 updates): 18880 ops/s, 154664960 Bps
testing speed of async sha256 (sha256-ce)
( 16 byte blocks, 16 bytes x 1 updates): 4107417 ops/s, 65718672 Bps
( 64 byte blocks, 16 bytes x 4 updates): 1418054 ops/s, 90755456 Bps
( 64 byte blocks, 64 bytes x 1 updates): 3323045 ops/s, 212674880 Bps
( 256 byte blocks, 16 bytes x 16 updates): 450084 ops/s, 115221504 Bps
( 256 byte blocks, 64 bytes x 4 updates): 1034376 ops/s, 264800256 Bps
( 256 byte blocks, 256 bytes x 1 updates): 1798744 ops/s, 460478464 Bps
( 1024 byte blocks, 16 bytes x 64 updates): 121411 ops/s, 124324864 Bps
( 1024 byte blocks, 256 bytes x 4 updates): 506086 ops/s, 518232064 Bps
( 1024 byte blocks, 1024 bytes x 1 updates): 634485 ops/s, 649712640 Bps
( 2048 byte blocks, 16 bytes x 128 updates): 61520 ops/s, 125992960 Bps
( 2048 byte blocks, 256 bytes x 8 updates): 266787 ops/s, 546379776 Bps
( 2048 byte blocks, 1024 bytes x 2 updates): 316910 ops/s, 649031680 Bps
( 2048 byte blocks, 2048 bytes x 1 updates): 342777 ops/s, 702007296 Bps
( 4096 byte blocks, 16 bytes x 256 updates): 31003 ops/s, 126988288 Bps
( 4096 byte blocks, 256 bytes x 16 updates): 138097 ops/s, 565645312 Bps
( 4096 byte blocks, 1024 bytes x 4 updates): 164319 ops/s, 673050624 Bps
( 4096 byte blocks, 4096 bytes x 1 updates): 176310 ops/s, 722165760 Bps
( 8192 byte blocks, 16 bytes x 512 updates): 15566 ops/s, 127516672 Bps
( 8192 byte blocks, 256 bytes x 32 updates): 69608 ops/s, 570228736 Bps
( 8192 byte blocks, 1024 bytes x 8 updates): 83682 ops/s, 685522944 Bps
( 8192 byte blocks, 4096 bytes x 2 updates): 88813 ops/s, 727556096 Bps
( 8192 byte blocks, 8192 bytes x 1 updates): 88781 ops/s, 727293952 Bps
Ard Biesheuvel (1):
crypto: arm64/sha256 - add support for SHA256 using NEON instructions
arch/arm64/crypto/Kconfig | 5 +
arch/arm64/crypto/Makefile | 11 +
arch/arm64/crypto/sha256-armv4.pl | 413 +++++++++
arch/arm64/crypto/sha256-core.S_shipped | 883 ++++++++++++++++++++
arch/arm64/crypto/sha256_neon_glue.c | 103 +++
5 files changed, 1415 insertions(+)
create mode 100644 arch/arm64/crypto/sha256-armv4.pl
create mode 100644 arch/arm64/crypto/sha256-core.S_shipped
create mode 100644 arch/arm64/crypto/sha256_neon_glue.c
--
2.7.4
More information about the linux-arm-kernel
mailing list