[PATCH v2] crypto: arm/chacha-neon - optimize for non-block size multiples

Eric Biggers ebiggers at kernel.org
Sat Dec 12 14:48:08 EST 2020


On Sat, Dec 12, 2020 at 08:24:24AM +0100, Ard Biesheuvel wrote:
> On Sat, 12 Dec 2020 at 07:43, Eric Biggers <ebiggers at kernel.org> wrote:
> >
> > Hi Ard,
> >
> > On Tue, Nov 03, 2020 at 05:28:09PM +0100, Ard Biesheuvel wrote:
> > > @@ -42,24 +42,24 @@ static void chacha_doneon(u32 *state, u8 *dst, const u8 *src,
> > >  {
> > >       u8 buf[CHACHA_BLOCK_SIZE];
> > >
> > > -     while (bytes >= CHACHA_BLOCK_SIZE * 4) {
> > > -             chacha_4block_xor_neon(state, dst, src, nrounds);
> > > -             bytes -= CHACHA_BLOCK_SIZE * 4;
> > > -             src += CHACHA_BLOCK_SIZE * 4;
> > > -             dst += CHACHA_BLOCK_SIZE * 4;
> > > -             state[12] += 4;
> > > -     }
> > > -     while (bytes >= CHACHA_BLOCK_SIZE) {
> > > -             chacha_block_xor_neon(state, dst, src, nrounds);
> > > -             bytes -= CHACHA_BLOCK_SIZE;
> > > -             src += CHACHA_BLOCK_SIZE;
> > > -             dst += CHACHA_BLOCK_SIZE;
> > > -             state[12]++;
> > > +     while (bytes > CHACHA_BLOCK_SIZE) {
> > > +             unsigned int l = min(bytes, CHACHA_BLOCK_SIZE * 4U);
> > > +
> > > +             chacha_4block_xor_neon(state, dst, src, nrounds, l);
> > > +             bytes -= l;
> > > +             src += l;
> > > +             dst += l;
> > > +             state[12] += DIV_ROUND_UP(l, CHACHA_BLOCK_SIZE);
> > >       }
> > >       if (bytes) {
> > > -             memcpy(buf, src, bytes);
> > > -             chacha_block_xor_neon(state, buf, buf, nrounds);
> > > -             memcpy(dst, buf, bytes);
> > > +             const u8 *s = src;
> > > +             u8 *d = dst;
> > > +
> > > +             if (bytes != CHACHA_BLOCK_SIZE)
> > > +                     s = d = memcpy(buf, src, bytes);
> > > +             chacha_block_xor_neon(state, d, s, nrounds);
> > > +             if (d != dst)
> > > +                     memcpy(dst, buf, bytes);
> > >       }
> > >  }
> > >
> >
> > Shouldn't this be incrementing the block counter after chacha_block_xor_neon()?
> > It might be needed by the library API.
> >
> 
> Yeah, good point. 'bytes' could be exactly CHACHA_BLOCK_SIZE now,
> which wasn't the case before.
> 
> I'll send a fix.
> 
> > Also, even with that fixed, this patch is causing the self-tests (both the
> > chacha20poly1305_selftest(), and the crypto API tests for chacha20-neon,
> > xchacha20-neon, and xchacha12-neon) to fail when I boot a kernel in QEMU.  This
> > doesn't happen on real hardware (Raspberry Pi 2), and I don't see any other bugs
> > in this patch, so I'm not sure what the problem is.  Did you run the self-tests
> > on every platform you tested this on?
> >
> 
> Does your QEMU lack this patch? I found that bug working on this code.
> 
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=604cef3e57eaeeef77074d78f6cf2eca1be11c62

It doesn't have that patch.  That must be the problem then; good to hear that
you've already fixed it.

- Eric



More information about the linux-arm-kernel mailing list