[PATCH 2/2] crypto: sha1: add ARM NEON implementation
Jussi Kivilinna
jussi.kivilinna at iki.fi
Sun Jun 29 04:25:20 PDT 2014
On 28.06.2014 23:07, Ard Biesheuvel wrote:> Hi Jussi,
>
> On 28 June 2014 12:40, Jussi Kivilinna <jussi.kivilinna at iki.fi> wrote:
>> This patch adds ARM NEON assembly implementation of SHA-1 algorithm.
>>
>> tcrypt benchmark results on Cortex-A8, sha1-arm-asm vs sha1-neon-asm:
>>
>> block-size bytes/update old-vs-new
>> 16 16 1.06x
>> 64 16 1.05x
>> 64 64 1.09x
>> 256 16 1.04x
>> 256 64 1.11x
>> 256 256 1.28x
>> 1024 16 1.04x
>> 1024 256 1.34x
>> 1024 1024 1.42x
>> 2048 16 1.04x
>> 2048 256 1.35x
>> 2048 1024 1.44x
>> 2048 2048 1.46x
>> 4096 16 1.04x
>> 4096 256 1.36x
>> 4096 1024 1.45x
>> 4096 4096 1.48x
>> 8192 16 1.04x
>> 8192 256 1.36x
>> 8192 1024 1.46x
>> 8192 4096 1.49x
>> 8192 8192 1.49x
>>
>
> This is a nice result: about the same speedup as OpenSSL when
> comparing the ALU asm implementation with the NEON.
>
>> Signed-off-by: Jussi Kivilinna <jussi.kivilinna at iki.fi>
>> ---
>> arch/arm/crypto/Makefile | 2
>> arch/arm/crypto/sha1-armv7-neon.S | 635 ++++++++++++++++++++++++++++++++++++
>> arch/arm/crypto/sha1_glue.c | 8
>> arch/arm/crypto/sha1_neon_glue.c | 197 +++++++++++
>> arch/arm/include/asm/crypto/sha1.h | 10 +
>> crypto/Kconfig | 11 +
>> 6 files changed, 860 insertions(+), 3 deletions(-)
>> create mode 100644 arch/arm/crypto/sha1-armv7-neon.S
>> create mode 100644 arch/arm/crypto/sha1_neon_glue.c
>> create mode 100644 arch/arm/include/asm/crypto/sha1.h
>>
>> diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
>> index 81cda39..374956d 100644
>> --- a/arch/arm/crypto/Makefile
>> +++ b/arch/arm/crypto/Makefile
>> @@ -5,10 +5,12 @@
>> obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o
>> obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o
>> obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
>> +obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
>>
>> aes-arm-y := aes-armv4.o aes_glue.o
>> aes-arm-bs-y := aesbs-core.o aesbs-glue.o
>> sha1-arm-y := sha1-armv4-large.o sha1_glue.o
>> +sha1-arm-neon-y := sha1-armv7-neon.o sha1_neon_glue.o
>>
>> quiet_cmd_perl = PERL $@
>> cmd_perl = $(PERL) $(<) > $(@)
>> diff --git a/arch/arm/crypto/sha1-armv7-neon.S b/arch/arm/crypto/sha1-armv7-neon.S
>> new file mode 100644
>> index 0000000..beb1ed1
>> --- /dev/null
>> +++ b/arch/arm/crypto/sha1-armv7-neon.S
>> @@ -0,0 +1,635 @@
>> +/* sha1-armv7-neon.S - ARM/NEON accelerated SHA-1 transform function
>> + *
>> + * Copyright © 2013-2014 Jussi Kivilinna <jussi.kivilinna at iki.fi>
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms of the GNU General Public License as published by the Free
>> + * Software Foundation; either version 2 of the License, or (at your option)
>> + * any later version.
>> + */
>> +
>> +.syntax unified
>> +#ifdef __thumb2__
>> +.thumb
>> +#else
>> +.code 32
>> +#endif
>
> This is all NEON code, which has no size benefit from being assembled
> as Thumb-2. (NEON instructions are 4 bytes in either case)
> If we drop the Thumb-2 versions, there's one less version to test.
>
Ok, I'll drop the .thumb part for both SHA1 and SHA512.
>> +.fpu neon
>> +
>> +.data
>> +
>> +#define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name
>> +
> [...]
>> +.align 4
>> +.LK_VEC:
>> +.LK1: .long K1, K1, K1, K1
>> +.LK2: .long K2, K2, K2, K2
>> +.LK3: .long K3, K3, K3, K3
>> +.LK4: .long K4, K4, K4, K4
>
> If you are going to put these constants in a different section, they
> belong in .rodata not .data.
> But why not just keep them in .text? In that case, you can replace the
> above 'ldr reg, =name' with 'adr reg ,name' (or adrl if required) and
> get rid of the .ltorg and the literal pool.
>
Ok, I'll move these to .text.
Actually I realized that these values can be loaded to still free NEON
registers for additional speed up.
>> +/*
>> + * Transform nblks*64 bytes (nblks*16 32-bit words) at DATA.
>> + *
>> + * unsigned int
>> + * sha1_transform_neon (void *ctx, const unsigned char *data,
>> + * unsigned int nblks)
>> + */
>> +.align 3
>> +.globl sha1_transform_neon
>> +.type sha1_transform_neon,%function;
>> +
>> +sha1_transform_neon:
>
> ENTRY(sha1_transform_neon) [and matching ENDPROC() below]
Sure.
-Jussi
More information about the linux-arm-kernel
mailing list