[PATCH v6 8/9] crypto: arm64/polyval: Add PMULL accelerated implementation of POLYVAL
Eric Biggers
ebiggers at kernel.org
Wed May 4 22:56:49 PDT 2022
On Wed, May 04, 2022 at 12:18:22AM +0000, Nathan Huckleberry wrote:
> + * X = [X_1 : X_0]
> + * Y = [Y_1 : Y_0]
> + *
> + * The multiplication produces four parts:
> + * LOW: The polynomial given by performing carryless multiplication of X_0 and
> + * Y_0
> + * MID: The polynomial given by performing carryless multiplication of (X_0 +
> + * X_1) and (Y_0 + Y_1)
> + * HIGH: The polynomial given by performing carryless multiplication of X_1
> + * and Y_1
> + *
> + * We compute:
> + * LO += LOW
> + * MI += MID
> + * HI += HIGH
Three parts, not four. But why not write this as the much more concise:
* Given:
* X = [X_1 : X_0]
* Y = [Y_1 : Y_0]
*
* We compute:
* LO += X_0 * Y_0
* MI += (X_0 + X_1) * (Y_0 + Y_1)
* HI += X_1 * Y_1
> + * So our final computation is: T = T_1 : T_0 = g*(x) * P_0 V = V_1 : V_0 =
> + * g*(x) * (P_1 + T_0) p(x) / x^{128} mod g(x) = P_3 + P_1 + T_0 + V_1 : P_2 +
> + * P_0 + T_1 + V_0
As on the x86 version, this part is now unreadable. It was fine in v5.
> + * [HI_1 : HI_0 + HI_1 + MI_1 + LO_1 : LO_1 + HI_0 + MI_0 + LO_0 : LO_0]
[...]
> + * [HI_1 : HI_1 + HI_0 + MI_1 + LO_1 : HI_0 + MI_0 + LO_1 + LO_0 : LO_0]
[...]
> + // TMP_V = T_1 : T_0 = P_0 * g*(x)
> + pmull TMP_V.1q, PL.1d, GSTAR.1d
[...]
> + // TMP_V = V_1 : V_0 = (P_1 + T_0) * g*(x)
> + pmull2 TMP_V.1q, GSTAR.2d, TMP_V.2d
> + eor DEST.16b, PH.16b, TMP_V.16b
[...]
> + pmull TMP_V.1q, GSTAR.1d, PL.1d
[...]
> + pmull2 TMP_V.1q, GSTAR.2d, TMP_V.2d
[...]
> + eor SUM.16b, TMP_V.16b, PH.16b
It looks like you didn't fully address my comments on v5 about putting operands
in a consistent order. Not a big deal, but assembly code is always hard to
read, and anything to make it easier would be greatly appreciated.
> +/*
> + * Handle any extra blocks afer full_stride loop.
> + */
Typo above.
> diff --git a/arch/arm64/crypto/polyval-ce-glue.c b/arch/arm64/crypto/polyval-ce-glue.c
[...]
> +struct polyval_tfm_ctx {
> + u8 key_powers[NUM_KEY_POWERS][POLYVAL_BLOCK_SIZE];
> +};
This is missing the comment about the order of the key powers that I had
suggested for readability. It made it into the x86 version but not here. This
file is very similar to arch/x86/crypto/polyval-clmulni_glue.c, so if you could
diff them and eliminate any unintended differences, that would be helpful.
Other than the above readability suggestions this patch looks good, nice job.
- Eric
More information about the linux-arm-kernel
mailing list