[PATCH v4] crypto: riscv/poly1305 - import OpenSSL/CRYPTOGAMS implementation

Tue Jun 24 04:08:35 PDT 2025

>> +#ifndef	__CHERI_PURE_CAPABILITY__
>> +	andi	$tmp0,$inp,7		# $inp % 8
>> +	andi	$inp,$inp,-8		# align $inp
>> +	slli	$tmp0,$tmp0,3		# byte to bit offset
>> +#endif
>> +	ld	$in0,0($inp)
>> +	ld	$in1,8($inp)
>> +#ifndef	__CHERI_PURE_CAPABILITY__
>> +	beqz	$tmp0,.Laligned_key
>> +
>> +	ld	$tmp2,16($inp)
>> +	neg	$tmp1,$tmp0		# implicit &63 in sll
>> +	srl	$in0,$in0,$tmp0
>> +	sll	$tmp3,$in1,$tmp1
>> +	srl	$in1,$in1,$tmp0
>> +	sll	$tmp2,$tmp2,$tmp1
>> +	or	$in0,$in0,$tmp3
>> +	or	$in1,$in1,$tmp2
>> +
>> +.Laligned_key:
> 
> This code is going through a lot of trouble to work on RISC-V CPUs that don't
> support efficient misaligned memory accesses.  That includes issuing loads of
> memory outside the bounds of the given buffer, which is questionable (even if
> it's guaranteed to not cross a page boundary).
> 
> Is there any chance we can just make the RISC-V Poly1305 code be conditional on
> CONFIG_RISCV_EFFICIENT_UNALIGNED_ACCESS=y?  Or do people not actually use that?

For reference. The penalties for handling unaligned data as above on a 
processor that can handle unaligned load efficiently are arguably 
tolerable. For example on Spacemit X60 it's meager ~7%. However, since 
poly1305 is always used with chacha20 and is faster than chacha20 the 
difference would be "masked" and rendered marginal. If anything, it 
makes more sense to utilize this option for chacha20 where the 
difference is way more significant, a tad less than 20%.

Cheers.