[PATCH v6 6/9] crypto: arm64/aes-xctr: Improve readability of XCTR and CTR modes

Nathan Huckleberry nhuck at google.com
Fri May 6 14:22:46 PDT 2022


On Fri, May 6, 2022 at 12:41 AM Eric Biggers <ebiggers at kernel.org> wrote:
>
> On Wed, May 04, 2022 at 12:18:20AM +0000, Nathan Huckleberry wrote:
> > Added some clarifying comments, changed the register allocations to make
> > the code clearer, and added register aliases.
> >
> > Signed-off-by: Nathan Huckleberry <nhuck at google.com>
>
> I was a bit surprised to see this after the xctr support patch rather than
> before.  Doing the cleanup first would make adding and reviewing the xctr
> support easier.  But it's not a big deal; if you already tested it this way you
> can just leave it as-is if you want.
>
> A few minor comments below.
>
> > +     /*
> > +      * Set up the counter values in v0-v4.
> > +      *
> > +      * If we are encrypting less than MAX_STRIDE blocks, the tail block
> > +      * handling code expects the last keystream block to be in v4.  For
> > +      * example: if encrypting two blocks with MAX_STRIDE=5, then v3 and v4
> > +      * should have the next two counter blocks.
> > +      */
>
> The first two mentions of v4 should actually be v{MAX_STRIDE-1}, as it is
> actually v4 for MAX_STRIDE==5 and v3 for MAX_STRIDE==4.
>
> > @@ -355,16 +383,16 @@ AES_FUNC_END(aes_cbc_cts_decrypt)
> >       mov             v3.16b, vctr.16b
> >  ST5( mov             v4.16b, vctr.16b                )
> >       .if \xctr
> > -             sub             x6, x11, #MAX_STRIDE - 1
> > -             sub             x7, x11, #MAX_STRIDE - 2
> > -             sub             x8, x11, #MAX_STRIDE - 3
> > -             sub             x9, x11, #MAX_STRIDE - 4
> > -ST5(         sub             x10, x11, #MAX_STRIDE - 5       )
> > -             eor             x6, x6, x12
> > -             eor             x7, x7, x12
> > -             eor             x8, x8, x12
> > -             eor             x9, x9, x12
> > -             eor             x10, x10, x12
> > +             sub             x6, CTR, #MAX_STRIDE - 1
> > +             sub             x7, CTR, #MAX_STRIDE - 2
> > +             sub             x8, CTR, #MAX_STRIDE - 3
> > +             sub             x9, CTR, #MAX_STRIDE - 4
> > +ST5(         sub             x10, CTR, #MAX_STRIDE - 5       )
> > +             eor             x6, x6, IV_PART
> > +             eor             x7, x7, IV_PART
> > +             eor             x8, x8, IV_PART
> > +             eor             x9, x9, IV_PART
> > +             eor             x10, x10, IV_PART
>
> The eor into x10 should be enclosed by ST5(), since it's dead code otherwise.
>
> > +     /*
> > +      * If there are at least MAX_STRIDE blocks left, XOR the plaintext with
> > +      * keystream and store.  Otherwise jump to tail handling.
> > +      */
>
> Technically this could be XOR-ing with either the plaintext or the ciphertext.
> Maybe write "data" instead.
>
> >  .Lctrtail1x\xctr:
> > -     sub             x7, x6, #16
> > -     csel            x6, x6, x7, eq
> > -     add             x1, x1, x6
> > -     add             x0, x0, x6
> > -     ld1             {v5.16b}, [x1]
> > -     ld1             {v6.16b}, [x0]
> > +     /*
> > +      * Handle <= 16 bytes of plaintext
> > +      */
> > +     sub             x8, x7, #16
> > +     csel            x7, x7, x8, eq
> > +     add             IN, IN, x7
> > +     add             OUT, OUT, x7
> > +     ld1             {v5.16b}, [IN]
> > +     ld1             {v6.16b}, [OUT]
> >  ST5( mov             v3.16b, v4.16b                  )
> >       encrypt_block   v3, w3, x2, x8, w7
>
> w3 and x2 should be ROUNDS_W and KEY, respectively.
>
> This code also has the very unusual property that it reads and writes before the
> buffers given.  Specifically, for bytes < 16, it access the 16 bytes beginning
> at &in[bytes - 16] and &dst[bytes - 16].  Mentioning this explicitly would be
> very helpful, particularly in the function comments for aes_ctr_encrypt() and
> aes_xctr_encrypt(), and maybe in the C code, so that anyone calling these
> functions has this in mind.

If bytes < 16, then the C code uses a buffer of 16 bytes to avoid
this. I'll add some comments explaining that because its not entirely
clear what is happening in the C unless you've taken a deep dive into
the asm.

>
> Anyway, with the above addressed feel free to add:
>
>         Reviewed-by: Eric Biggers <ebiggers at google.com>
>
> - Eric



More information about the linux-arm-kernel mailing list