[PATCH v6 5/6] crypto: arm64/aes-ccm - reduce NEON begin/end calls for common case
Eric Biggers
ebiggers at kernel.org
Wed May 26 10:14:08 PDT 2021
On Wed, May 26, 2021 at 12:07:28PM +0200, Ard Biesheuvel wrote:
> AES-CCM (as used in WPA2 CCMP, for instance) typically involves
> authenticate-only data, and operates on a single network packet, and so
> the common case is for the authenticate, en/decrypt and finalize SIMD
> helpers to all be called exactly once in sequence. Since
> kernel_neon_end() now involves manipulation of the preemption state as
> well as the softirq mask state, let's reduce the number of times we are
> forced to call it to only once if we are handling this common case.
>
> Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> ---
> arch/arm64/crypto/aes-ce-ccm-core.S | 1 +
> arch/arm64/crypto/aes-ce-ccm-glue.c | 74 +++++++++++---------
> 2 files changed, 43 insertions(+), 32 deletions(-)
>
> diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S
> index 99a028e298ed..8adff299fcd3 100644
> --- a/arch/arm64/crypto/aes-ce-ccm-core.S
> +++ b/arch/arm64/crypto/aes-ce-ccm-core.S
> @@ -124,6 +124,7 @@ SYM_FUNC_START(ce_aes_ccm_final)
> SYM_FUNC_END(ce_aes_ccm_final)
>
> .macro aes_ccm_do_crypt,enc
> + cbz x2, 5f
> ldr x8, [x6, #8] /* load lower ctr */
> ld1 {v0.16b}, [x5] /* load mac */
> CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */
> diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c
> index 54bd2494a000..98159f2c49ae 100644
> --- a/arch/arm64/crypto/aes-ce-ccm-glue.c
> +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c
> @@ -97,10 +97,8 @@ static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen)
> static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[],
> u32 abytes, u32 *macp)
> {
> - kernel_neon_begin();
> ce_aes_ccm_auth_data(mac, in, abytes, macp, key->key_enc,
> num_rounds(key));
> - kernel_neon_end();
> }
[...]
> + if (req->assoclen)
> + ccm_calculate_auth_mac(req, mac);
> +
This still makes all the associated data be processed under a single
kernel_neon_begin() / kernel_neon_end() pair, even if there is a large amount of
it. Shouldn't it be limited to a reasonable amount at a time, like 4K?
This sort of thing has been considered a bug before, e.g. see
commit 706024a52c6 ("crypto: arch/lib - limit simd usage to 4k chunks").
You could do the entire CCM operation under a single pair as long as there isn't
more than 4K of associated data.
- Eric
More information about the linux-arm-kernel
mailing list