[PATCH v2] crypto: arm/chacha-neon - optimize for non-block size multiples
Ard Biesheuvel
ardb at kernel.org
Sat Dec 12 02:24:24 EST 2020
On Sat, 12 Dec 2020 at 07:43, Eric Biggers <ebiggers at kernel.org> wrote:
>
> Hi Ard,
>
> On Tue, Nov 03, 2020 at 05:28:09PM +0100, Ard Biesheuvel wrote:
> > @@ -42,24 +42,24 @@ static void chacha_doneon(u32 *state, u8 *dst, const u8 *src,
> > {
> > u8 buf[CHACHA_BLOCK_SIZE];
> >
> > - while (bytes >= CHACHA_BLOCK_SIZE * 4) {
> > - chacha_4block_xor_neon(state, dst, src, nrounds);
> > - bytes -= CHACHA_BLOCK_SIZE * 4;
> > - src += CHACHA_BLOCK_SIZE * 4;
> > - dst += CHACHA_BLOCK_SIZE * 4;
> > - state[12] += 4;
> > - }
> > - while (bytes >= CHACHA_BLOCK_SIZE) {
> > - chacha_block_xor_neon(state, dst, src, nrounds);
> > - bytes -= CHACHA_BLOCK_SIZE;
> > - src += CHACHA_BLOCK_SIZE;
> > - dst += CHACHA_BLOCK_SIZE;
> > - state[12]++;
> > + while (bytes > CHACHA_BLOCK_SIZE) {
> > + unsigned int l = min(bytes, CHACHA_BLOCK_SIZE * 4U);
> > +
> > + chacha_4block_xor_neon(state, dst, src, nrounds, l);
> > + bytes -= l;
> > + src += l;
> > + dst += l;
> > + state[12] += DIV_ROUND_UP(l, CHACHA_BLOCK_SIZE);
> > }
> > if (bytes) {
> > - memcpy(buf, src, bytes);
> > - chacha_block_xor_neon(state, buf, buf, nrounds);
> > - memcpy(dst, buf, bytes);
> > + const u8 *s = src;
> > + u8 *d = dst;
> > +
> > + if (bytes != CHACHA_BLOCK_SIZE)
> > + s = d = memcpy(buf, src, bytes);
> > + chacha_block_xor_neon(state, d, s, nrounds);
> > + if (d != dst)
> > + memcpy(dst, buf, bytes);
> > }
> > }
> >
>
> Shouldn't this be incrementing the block counter after chacha_block_xor_neon()?
> It might be needed by the library API.
>
Yeah, good point. 'bytes' could be exactly CHACHA_BLOCK_SIZE now,
which wasn't the case before.
I'll send a fix.
> Also, even with that fixed, this patch is causing the self-tests (both the
> chacha20poly1305_selftest(), and the crypto API tests for chacha20-neon,
> xchacha20-neon, and xchacha12-neon) to fail when I boot a kernel in QEMU. This
> doesn't happen on real hardware (Raspberry Pi 2), and I don't see any other bugs
> in this patch, so I'm not sure what the problem is. Did you run the self-tests
> on every platform you tested this on?
>
Does your QEMU lack this patch? I found that bug working on this code.
https://git.qemu.org/?p=qemu.git;a=commitdiff;h=604cef3e57eaeeef77074d78f6cf2eca1be11c62
More information about the linux-arm-kernel
mailing list