[PATCH v4 4/8] crypto: x86/aesni-xctr: Add accelerated implementation of XCTR

Thu Apr 21 14:59:31 PDT 2022

On Mon, Apr 18, 2022 at 7:13 PM Eric Biggers <ebiggers at kernel.org> wrote:
>
> On Tue, Apr 12, 2022 at 05:28:12PM +0000, Nathan Huckleberry wrote:
> > diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
> > index 363699dd7220..ce17fe630150 100644
> > --- a/arch/x86/crypto/aesni-intel_asm.S
> > +++ b/arch/x86/crypto/aesni-intel_asm.S
> > @@ -2821,6 +2821,76 @@ SYM_FUNC_END(aesni_ctr_enc)
> >
> >  #endif
> >
> > +#ifdef __x86_64__
> > +/*
> > + * void aesni_xctr_enc(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
> > + *                 size_t len, u8 *iv, int byte_ctr)
> > + */
> > +SYM_FUNC_START(aesni_xctr_enc)
> > +     FRAME_BEGIN
> > +     cmp $16, LEN
> > +     jb .Lxctr_ret
> > +     shr     $4, %arg6
> > +     movq %arg6, CTR
> > +     mov 480(KEYP), KLEN
> > +     movups (IVP), IV
> > +     cmp $64, LEN
> > +     jb .Lxctr_enc_loop1
> > +.align 4
> > +.Lxctr_enc_loop4:
> > +     movaps IV, STATE1
> > +     vpaddq ONE(%rip), CTR, CTR
> > +     vpxor CTR, STATE1, STATE1
> > +     movups (INP), IN1
> > +     movaps IV, STATE2
> > +     vpaddq ONE(%rip), CTR, CTR
> > +     vpxor CTR, STATE2, STATE2
> > +     movups 0x10(INP), IN2
> > +     movaps IV, STATE3
> > +     vpaddq ONE(%rip), CTR, CTR
> > +     vpxor CTR, STATE3, STATE3
> > +     movups 0x20(INP), IN3
> > +     movaps IV, STATE4
> > +     vpaddq ONE(%rip), CTR, CTR
> > +     vpxor CTR, STATE4, STATE4
> > +     movups 0x30(INP), IN4
> > +     call _aesni_enc4
> > +     pxor IN1, STATE1
> > +     movups STATE1, (OUTP)
> > +     pxor IN2, STATE2
> > +     movups STATE2, 0x10(OUTP)
> > +     pxor IN3, STATE3
> > +     movups STATE3, 0x20(OUTP)
> > +     pxor IN4, STATE4
> > +     movups STATE4, 0x30(OUTP)
> > +     sub $64, LEN
> > +     add $64, INP
> > +     add $64, OUTP
> > +     cmp $64, LEN
> > +     jge .Lxctr_enc_loop4
> > +     cmp $16, LEN
> > +     jb .Lxctr_ret
> > +.align 4
> > +.Lxctr_enc_loop1:
> > +     movaps IV, STATE
> > +     vpaddq ONE(%rip), CTR, CTR
> > +     vpxor CTR, STATE1, STATE1
> > +     movups (INP), IN
> > +     call _aesni_enc1
> > +     pxor IN, STATE
> > +     movups STATE, (OUTP)
> > +     sub $16, LEN
> > +     add $16, INP
> > +     add $16, OUTP
> > +     cmp $16, LEN
> > +     jge .Lxctr_enc_loop1
> > +.Lxctr_ret:
> > +     FRAME_END
> > +     RET
> > +SYM_FUNC_END(aesni_xctr_enc)
> > +
> > +#endif
>
> Sorry, I missed this file.  This is the non-AVX version, right?  That means that
> AVX instructions, i.e. basically anything instruction starting with "v", can't
> be used here.  So the above isn't going to work.  (There might be a way to test
> this with QEMU; maybe --cpu-type=Nehalem without --enable-kvm?)
>
> You could rewrite this without using AVX instructions.  However, polyval-clmulni
> is broken in the same way; it uses AVX instructions without checking whether
> they are available.  But your patchset doesn't aim to provide a non-AVX polyval
> implementation at all.  So even if you got the non-AVX XCTR working, it wouldn't
> be paired with an accelerated polyval.
>
> So I think you should just not provide non-AVX versions for now.  That would
> mean:
>
>         1.) Drop the change to aesni-intel_asm.S
>         2.) Don't register the AES XCTR algorithm unless AVX is available
>             (in addition to AES-NI)

Is there a preferred way to conditionally register xctr? It looks like
aesni-intel_glue.c registers a default implementation for all the
algorithms in the array, then better versions are enabled depending on
cpu features. Should I remove xctr from the list of other algorithms
and register it separately?

>         3.) Don't register polyval-clmulni unless AVX is available
>             (in addition to CLMUL-NI)
>
> - Eric