[PATCH v2] crypto: arm/chacha-neon - optimize for non-block size multiples
Eric Biggers
ebiggers at kernel.org
Sat Dec 12 14:48:08 EST 2020
On Sat, Dec 12, 2020 at 08:24:24AM +0100, Ard Biesheuvel wrote:
> On Sat, 12 Dec 2020 at 07:43, Eric Biggers <ebiggers at kernel.org> wrote:
> >
> > Hi Ard,
> >
> > On Tue, Nov 03, 2020 at 05:28:09PM +0100, Ard Biesheuvel wrote:
> > > @@ -42,24 +42,24 @@ static void chacha_doneon(u32 *state, u8 *dst, const u8 *src,
> > > {
> > > u8 buf[CHACHA_BLOCK_SIZE];
> > >
> > > - while (bytes >= CHACHA_BLOCK_SIZE * 4) {
> > > - chacha_4block_xor_neon(state, dst, src, nrounds);
> > > - bytes -= CHACHA_BLOCK_SIZE * 4;
> > > - src += CHACHA_BLOCK_SIZE * 4;
> > > - dst += CHACHA_BLOCK_SIZE * 4;
> > > - state[12] += 4;
> > > - }
> > > - while (bytes >= CHACHA_BLOCK_SIZE) {
> > > - chacha_block_xor_neon(state, dst, src, nrounds);
> > > - bytes -= CHACHA_BLOCK_SIZE;
> > > - src += CHACHA_BLOCK_SIZE;
> > > - dst += CHACHA_BLOCK_SIZE;
> > > - state[12]++;
> > > + while (bytes > CHACHA_BLOCK_SIZE) {
> > > + unsigned int l = min(bytes, CHACHA_BLOCK_SIZE * 4U);
> > > +
> > > + chacha_4block_xor_neon(state, dst, src, nrounds, l);
> > > + bytes -= l;
> > > + src += l;
> > > + dst += l;
> > > + state[12] += DIV_ROUND_UP(l, CHACHA_BLOCK_SIZE);
> > > }
> > > if (bytes) {
> > > - memcpy(buf, src, bytes);
> > > - chacha_block_xor_neon(state, buf, buf, nrounds);
> > > - memcpy(dst, buf, bytes);
> > > + const u8 *s = src;
> > > + u8 *d = dst;
> > > +
> > > + if (bytes != CHACHA_BLOCK_SIZE)
> > > + s = d = memcpy(buf, src, bytes);
> > > + chacha_block_xor_neon(state, d, s, nrounds);
> > > + if (d != dst)
> > > + memcpy(dst, buf, bytes);
> > > }
> > > }
> > >
> >
> > Shouldn't this be incrementing the block counter after chacha_block_xor_neon()?
> > It might be needed by the library API.
> >
>
> Yeah, good point. 'bytes' could be exactly CHACHA_BLOCK_SIZE now,
> which wasn't the case before.
>
> I'll send a fix.
>
> > Also, even with that fixed, this patch is causing the self-tests (both the
> > chacha20poly1305_selftest(), and the crypto API tests for chacha20-neon,
> > xchacha20-neon, and xchacha12-neon) to fail when I boot a kernel in QEMU. This
> > doesn't happen on real hardware (Raspberry Pi 2), and I don't see any other bugs
> > in this patch, so I'm not sure what the problem is. Did you run the self-tests
> > on every platform you tested this on?
> >
>
> Does your QEMU lack this patch? I found that bug working on this code.
>
> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=604cef3e57eaeeef77074d78f6cf2eca1be11c62
It doesn't have that patch. That must be the problem then; good to hear that
you've already fixed it.
- Eric
More information about the linux-arm-kernel
mailing list