[PATCH v2] crypto: arm/chacha-neon - optimize for non-block size multiples

Ard Biesheuvel ardb at kernel.org
Sat Dec 12 02:24:24 EST 2020


On Sat, 12 Dec 2020 at 07:43, Eric Biggers <ebiggers at kernel.org> wrote:
>
> Hi Ard,
>
> On Tue, Nov 03, 2020 at 05:28:09PM +0100, Ard Biesheuvel wrote:
> > @@ -42,24 +42,24 @@ static void chacha_doneon(u32 *state, u8 *dst, const u8 *src,
> >  {
> >       u8 buf[CHACHA_BLOCK_SIZE];
> >
> > -     while (bytes >= CHACHA_BLOCK_SIZE * 4) {
> > -             chacha_4block_xor_neon(state, dst, src, nrounds);
> > -             bytes -= CHACHA_BLOCK_SIZE * 4;
> > -             src += CHACHA_BLOCK_SIZE * 4;
> > -             dst += CHACHA_BLOCK_SIZE * 4;
> > -             state[12] += 4;
> > -     }
> > -     while (bytes >= CHACHA_BLOCK_SIZE) {
> > -             chacha_block_xor_neon(state, dst, src, nrounds);
> > -             bytes -= CHACHA_BLOCK_SIZE;
> > -             src += CHACHA_BLOCK_SIZE;
> > -             dst += CHACHA_BLOCK_SIZE;
> > -             state[12]++;
> > +     while (bytes > CHACHA_BLOCK_SIZE) {
> > +             unsigned int l = min(bytes, CHACHA_BLOCK_SIZE * 4U);
> > +
> > +             chacha_4block_xor_neon(state, dst, src, nrounds, l);
> > +             bytes -= l;
> > +             src += l;
> > +             dst += l;
> > +             state[12] += DIV_ROUND_UP(l, CHACHA_BLOCK_SIZE);
> >       }
> >       if (bytes) {
> > -             memcpy(buf, src, bytes);
> > -             chacha_block_xor_neon(state, buf, buf, nrounds);
> > -             memcpy(dst, buf, bytes);
> > +             const u8 *s = src;
> > +             u8 *d = dst;
> > +
> > +             if (bytes != CHACHA_BLOCK_SIZE)
> > +                     s = d = memcpy(buf, src, bytes);
> > +             chacha_block_xor_neon(state, d, s, nrounds);
> > +             if (d != dst)
> > +                     memcpy(dst, buf, bytes);
> >       }
> >  }
> >
>
> Shouldn't this be incrementing the block counter after chacha_block_xor_neon()?
> It might be needed by the library API.
>

Yeah, good point. 'bytes' could be exactly CHACHA_BLOCK_SIZE now,
which wasn't the case before.

I'll send a fix.

> Also, even with that fixed, this patch is causing the self-tests (both the
> chacha20poly1305_selftest(), and the crypto API tests for chacha20-neon,
> xchacha20-neon, and xchacha12-neon) to fail when I boot a kernel in QEMU.  This
> doesn't happen on real hardware (Raspberry Pi 2), and I don't see any other bugs
> in this patch, so I'm not sure what the problem is.  Did you run the self-tests
> on every platform you tested this on?
>

Does your QEMU lack this patch? I found that bug working on this code.

https://git.qemu.org/?p=qemu.git;a=commitdiff;h=604cef3e57eaeeef77074d78f6cf2eca1be11c62



More information about the linux-arm-kernel mailing list