[PATCH v4 4/8] crypto: x86/aesni-xctr: Add accelerated implementation of XCTR
Eric Biggers
ebiggers at kernel.org
Mon Apr 18 17:13:54 PDT 2022
On Tue, Apr 12, 2022 at 05:28:12PM +0000, Nathan Huckleberry wrote:
> diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
> index 363699dd7220..ce17fe630150 100644
> --- a/arch/x86/crypto/aesni-intel_asm.S
> +++ b/arch/x86/crypto/aesni-intel_asm.S
> @@ -2821,6 +2821,76 @@ SYM_FUNC_END(aesni_ctr_enc)
>
> #endif
>
> +#ifdef __x86_64__
> +/*
> + * void aesni_xctr_enc(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
> + * size_t len, u8 *iv, int byte_ctr)
> + */
> +SYM_FUNC_START(aesni_xctr_enc)
> + FRAME_BEGIN
> + cmp $16, LEN
> + jb .Lxctr_ret
> + shr $4, %arg6
> + movq %arg6, CTR
> + mov 480(KEYP), KLEN
> + movups (IVP), IV
> + cmp $64, LEN
> + jb .Lxctr_enc_loop1
> +.align 4
> +.Lxctr_enc_loop4:
> + movaps IV, STATE1
> + vpaddq ONE(%rip), CTR, CTR
> + vpxor CTR, STATE1, STATE1
> + movups (INP), IN1
> + movaps IV, STATE2
> + vpaddq ONE(%rip), CTR, CTR
> + vpxor CTR, STATE2, STATE2
> + movups 0x10(INP), IN2
> + movaps IV, STATE3
> + vpaddq ONE(%rip), CTR, CTR
> + vpxor CTR, STATE3, STATE3
> + movups 0x20(INP), IN3
> + movaps IV, STATE4
> + vpaddq ONE(%rip), CTR, CTR
> + vpxor CTR, STATE4, STATE4
> + movups 0x30(INP), IN4
> + call _aesni_enc4
> + pxor IN1, STATE1
> + movups STATE1, (OUTP)
> + pxor IN2, STATE2
> + movups STATE2, 0x10(OUTP)
> + pxor IN3, STATE3
> + movups STATE3, 0x20(OUTP)
> + pxor IN4, STATE4
> + movups STATE4, 0x30(OUTP)
> + sub $64, LEN
> + add $64, INP
> + add $64, OUTP
> + cmp $64, LEN
> + jge .Lxctr_enc_loop4
> + cmp $16, LEN
> + jb .Lxctr_ret
> +.align 4
> +.Lxctr_enc_loop1:
> + movaps IV, STATE
> + vpaddq ONE(%rip), CTR, CTR
> + vpxor CTR, STATE1, STATE1
> + movups (INP), IN
> + call _aesni_enc1
> + pxor IN, STATE
> + movups STATE, (OUTP)
> + sub $16, LEN
> + add $16, INP
> + add $16, OUTP
> + cmp $16, LEN
> + jge .Lxctr_enc_loop1
> +.Lxctr_ret:
> + FRAME_END
> + RET
> +SYM_FUNC_END(aesni_xctr_enc)
> +
> +#endif
Sorry, I missed this file. This is the non-AVX version, right? That means that
AVX instructions, i.e. basically anything instruction starting with "v", can't
be used here. So the above isn't going to work. (There might be a way to test
this with QEMU; maybe --cpu-type=Nehalem without --enable-kvm?)
You could rewrite this without using AVX instructions. However, polyval-clmulni
is broken in the same way; it uses AVX instructions without checking whether
they are available. But your patchset doesn't aim to provide a non-AVX polyval
implementation at all. So even if you got the non-AVX XCTR working, it wouldn't
be paired with an accelerated polyval.
So I think you should just not provide non-AVX versions for now. That would
mean:
1.) Drop the change to aesni-intel_asm.S
2.) Don't register the AES XCTR algorithm unless AVX is available
(in addition to AES-NI)
3.) Don't register polyval-clmulni unless AVX is available
(in addition to CLMUL-NI)
- Eric
More information about the linux-arm-kernel
mailing list