[PATCH v2] crypto: arm/chacha-neon - optimize for non-block size multiples

Herbert Xu herbert at gondor.apana.org.au
Fri Nov 13 00:10:17 EST 2020


On Tue, Nov 03, 2020 at 05:28:09PM +0100, Ard Biesheuvel wrote:
> The current NEON based ChaCha implementation for ARM is optimized for
> multiples of 4x the ChaCha block size (64 bytes). This makes sense for
> block encryption, but given that ChaCha is also often used in the
> context of networking, it makes sense to consider arbitrary length
> inputs as well.
> 
> For example, WireGuard typically uses 1420 byte packets, and performing
> ChaCha encryption involves 5 invocations of chacha_4block_xor_neon()
> and 3 invocations of chacha_block_xor_neon(), where the last one also
> involves a memcpy() using a buffer on the stack to process the final
> chunk of 1420 % 64 == 12 bytes.
> 
> Let's optimize for this case as well, by letting chacha_4block_xor_neon()
> deal with any input size between 64 and 256 bytes, using NEON permutation
> instructions and overlapping loads and stores. This way, the 140 byte
> tail of a 1420 byte input buffer can simply be processed in one go.
> 
> This results in the following performance improvements for 1420 byte
> blocks, without significant impact on power-of-2 input sizes. (Note
> that Raspberry Pi is widely used in combination with a 32-bit kernel,
> even though the core is 64-bit capable)
> 
>    Cortex-A8  (BeagleBone)       :   7%
>    Cortex-A15 (Calxeda Midway)   :  21%
>    Cortex-A53 (Raspberry Pi 3)   :   3%
>    Cortex-A72 (Raspberry Pi 4)   :  19%
> 
> Cc: Eric Biggers <ebiggers at google.com>
> Cc: "Jason A . Donenfeld" <Jason at zx2c4.com>
> Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> ---
> v2:
> - avoid memcpy() if the residual byte count is exactly 64 bytes
> - get rid of register based post increments, and simply rewind the src
>   pointer as needed (the dst pointer did not need the register post
>   increment in the first place)
> - add benchmark results for 32-bit CPUs to commit log.
> 
>  arch/arm/crypto/chacha-glue.c      | 34 +++----
>  arch/arm/crypto/chacha-neon-core.S | 97 ++++++++++++++++++--
>  2 files changed, 107 insertions(+), 24 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert at gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



More information about the linux-arm-kernel mailing list