[PATCH v5 7/8] crypto: arm64/polyval: Add PMULL accelerated implementation of POLYVAL

Mon May 2 11:11:32 PDT 2022

On Sun, May 01, 2022 at 01:21:52PM -0700, Eric Biggers wrote:
> > +static int polyval_arm64_update(struct shash_desc *desc,
> > +			 const u8 *src, unsigned int srclen)
> > +{
> > +	struct polyval_desc_ctx *dctx = shash_desc_ctx(desc);
> > +	struct polyval_tfm_ctx *ctx = crypto_shash_ctx(desc->tfm);
> > +	u8 *pos;
> > +	unsigned int nblocks;
> > +	unsigned int n;
> > +
> > +	if (dctx->bytes) {
> > +		n = min(srclen, dctx->bytes);
> > +		pos = dctx->buffer + POLYVAL_BLOCK_SIZE - dctx->bytes;
> > +
> > +		dctx->bytes -= n;
> > +		srclen -= n;
> > +
> > +		while (n--)
> > +			*pos++ ^= *src++;
> > +
> > +		if (!dctx->bytes)
> > +			internal_polyval_mul(dctx->buffer,
> > +				ctx->key_powers[NUM_KEY_POWERS-1]);
> > +	}
> > +
> > +	nblocks = srclen/POLYVAL_BLOCK_SIZE;
> > +	internal_polyval_update(ctx, src, nblocks, dctx->buffer);
> > +	srclen -= nblocks*POLYVAL_BLOCK_SIZE;
> 
> This is executing a kernel_neon_begin()/kernel_neon_end() section of unbounded
> length.  To allow the task to be preempted occasionally, it needs to handle the
> input in chunks, e.g. 4K at a time, like the existing code for other algorithms
> does.  Something like the following would work:
> 
> @@ -122,13 +118,16 @@ static int polyval_arm64_update(struct shash_desc *desc,
>  				ctx->key_powers[NUM_KEY_POWERS-1]);
>  	}
>  
> -	nblocks = srclen/POLYVAL_BLOCK_SIZE;
> -	internal_polyval_update(ctx, src, nblocks, dctx->buffer);
> -	srclen -= nblocks*POLYVAL_BLOCK_SIZE;
> +	while (srclen >= POLYVAL_BLOCK_SIZE) {
> +		/* Allow rescheduling every 4K bytes. */
> +		nblocks = min(srclen, 4096U) / POLYVAL_BLOCK_SIZE;
> +		internal_polyval_update(ctx, src, nblocks, dctx->buffer);
> +		srclen -= nblocks * POLYVAL_BLOCK_SIZE;
> +		src += nblocks * POLYVAL_BLOCK_SIZE;
> +	}
>  
>  	if (srclen) {
>  		dctx->bytes = POLYVAL_BLOCK_SIZE - srclen;
> -		src += nblocks*POLYVAL_BLOCK_SIZE;
>  		pos = dctx->buffer;
>  		while (srclen--)
>  			*pos++ ^= *src++;
> 

Also to be clear, this problem is specific to the "shash" API.  You don't need
to worry about it for "skcipher" algorithms such as xctr(*), as they have to
walk a scatterlist to get their data, and that happens a page at a time.

- Eric