[PATCH v5 7/8] crypto: arm64/polyval: Add PMULL accelerated implementation of POLYVAL
Eric Biggers
ebiggers at kernel.org
Mon May 2 11:11:32 PDT 2022
On Sun, May 01, 2022 at 01:21:52PM -0700, Eric Biggers wrote:
> > +static int polyval_arm64_update(struct shash_desc *desc,
> > + const u8 *src, unsigned int srclen)
> > +{
> > + struct polyval_desc_ctx *dctx = shash_desc_ctx(desc);
> > + struct polyval_tfm_ctx *ctx = crypto_shash_ctx(desc->tfm);
> > + u8 *pos;
> > + unsigned int nblocks;
> > + unsigned int n;
> > +
> > + if (dctx->bytes) {
> > + n = min(srclen, dctx->bytes);
> > + pos = dctx->buffer + POLYVAL_BLOCK_SIZE - dctx->bytes;
> > +
> > + dctx->bytes -= n;
> > + srclen -= n;
> > +
> > + while (n--)
> > + *pos++ ^= *src++;
> > +
> > + if (!dctx->bytes)
> > + internal_polyval_mul(dctx->buffer,
> > + ctx->key_powers[NUM_KEY_POWERS-1]);
> > + }
> > +
> > + nblocks = srclen/POLYVAL_BLOCK_SIZE;
> > + internal_polyval_update(ctx, src, nblocks, dctx->buffer);
> > + srclen -= nblocks*POLYVAL_BLOCK_SIZE;
>
> This is executing a kernel_neon_begin()/kernel_neon_end() section of unbounded
> length. To allow the task to be preempted occasionally, it needs to handle the
> input in chunks, e.g. 4K at a time, like the existing code for other algorithms
> does. Something like the following would work:
>
> @@ -122,13 +118,16 @@ static int polyval_arm64_update(struct shash_desc *desc,
> ctx->key_powers[NUM_KEY_POWERS-1]);
> }
>
> - nblocks = srclen/POLYVAL_BLOCK_SIZE;
> - internal_polyval_update(ctx, src, nblocks, dctx->buffer);
> - srclen -= nblocks*POLYVAL_BLOCK_SIZE;
> + while (srclen >= POLYVAL_BLOCK_SIZE) {
> + /* Allow rescheduling every 4K bytes. */
> + nblocks = min(srclen, 4096U) / POLYVAL_BLOCK_SIZE;
> + internal_polyval_update(ctx, src, nblocks, dctx->buffer);
> + srclen -= nblocks * POLYVAL_BLOCK_SIZE;
> + src += nblocks * POLYVAL_BLOCK_SIZE;
> + }
>
> if (srclen) {
> dctx->bytes = POLYVAL_BLOCK_SIZE - srclen;
> - src += nblocks*POLYVAL_BLOCK_SIZE;
> pos = dctx->buffer;
> while (srclen--)
> *pos++ ^= *src++;
>
Also to be clear, this problem is specific to the "shash" API. You don't need
to worry about it for "skcipher" algorithms such as xctr(*), as they have to
walk a scatterlist to get their data, and that happens a page at a time.
- Eric
More information about the linux-arm-kernel
mailing list