[PATCH v4] crypto: riscv/poly1305 - import OpenSSL/CRYPTOGAMS implementation

Wed Jun 25 02:38:20 PDT 2025

>>>> +#ifndef	__CHERI_PURE_CAPABILITY__
>>>> +	andi	$tmp0,$inp,7		# $inp % 8
>>>> +	andi	$inp,$inp,-8		# align $inp
>>>> +	slli	$tmp0,$tmp0,3		# byte to bit offset
>>>> +#endif
>>>> +	ld	$in0,0($inp)
>>>> +	ld	$in1,8($inp)
>>>> +#ifndef	__CHERI_PURE_CAPABILITY__
>>>> +	beqz	$tmp0,.Laligned_key
>>>> +
>>>> +	ld	$tmp2,16($inp)
>>>> +	neg	$tmp1,$tmp0		# implicit &63 in sll
>>>> +	srl	$in0,$in0,$tmp0
>>>> +	sll	$tmp3,$in1,$tmp1
>>>> +	srl	$in1,$in1,$tmp0
>>>> +	sll	$tmp2,$tmp2,$tmp1
>>>> +	or	$in0,$in0,$tmp3
>>>> +	or	$in1,$in1,$tmp2
>>>> +
>>>> +.Laligned_key:
>>>
>>> This code is going through a lot of trouble to work on RISC-V CPUs that don't
>>> support efficient misaligned memory accesses.  That includes issuing loads of
>>> memory outside the bounds of the given buffer, which is questionable (even if
>>> it's guaranteed to not cross a page boundary).
>>
>> It's indeed guaranteed to not cross a page *nor* even cache-line boundaries.
>> Hence they can't trigger any externally observed side effects the
>> corresponding unaligned loads won't. What is the concern otherwise? [Do note
>> that the boundaries are not crossed on a boundary-enforcable CHERI platform
>> ;-)]
> 
> With this, we get:
> 
> - More complex code.

My rationale is as follows. It's beneficial to have this code to cover 
the whole spectrum of processor implementations. I for one would even 
say it's important, because penalties on processors that can't handle 
misaligned access efficiently are just too high to ignore. Now, it's 
possible to bypass it with #ifdef-s (as done for CHERI), but to make 
things less confusing, a.k.a. *less* complex, it's preferred to rely on 
the compiler predefines (as done for CHERI). Later compiler versions 
introduced apparently suitable predefines for this, 
__riscv_misaligned_slow/fast/avoid. However, as of the moment of this 
writing the macros in question don't seem to depend on the -mcpu 
parameter. But it's probably reasonable to assume that they will at a 
later point. So the suggestion would be to use these. Does it sound 
reasonable? Or would you insist on a custom macro that would need to be 
set depending on CONFIG_RISCV_EFFICIENT_UNALIGNED_ACCESS?

> - Slower on CPUs that do support efficient misaligned accesses.

With arguably marginal penalty, as discussed in the previous message. In 
the context one can also view it as a trade-off between a small penalty 
and increased #ifdef spaghetti :-)

> - The buffer underflows and overflows could cause problems with future CPU
>    behavior.  (Did you consider the palette memory extension, for example?)

Pallette memory extension colours fixed-size, hence accordingly aligned, 
blocks. Since the block size is larger than the word load size, any 
aligned load would be safe, because even a single "excess" or "short" 
byte would colour the whole block accordingly.

Just in case to be clear. The argument is about loads. Misaligned stores 
is naturally different matter and it would be inappropriate to handle 
them in the similar manner.

> That being said, if there will continue to be many RISC-V CPUs that don't
> support efficient misaligned accesses, then we effectively need to do this
> anyway.  I hoped that things might be developing along the lines of ARM, where
> eventually misaligned accesses started being supported uniformly.  But perhaps
> RISC-V is still in the process of learning that lesson.

One has to recognize that it can also be a matter of cost. I mean 
imagine you want to license the least expensive IP from SiFive, or have 
very limited space for MCU. Well, Linux, naturally having higher minimum 
requirements, doesn't have to care about these, but it doesn't mean that 
nobody would :-)

>>> The rest of the kernel's RISC-V crypto code, which is based on the vector
>>> extension, just assumes that efficient misaligned memory accesses are supported.
>>
>> Was it tested on real hardware though? I wonder what hardware is out there
>> that supports the vector crypto extensions?
> 
> If I remember correctly, SiFive tested it on their hardware.

Cool! The question was rather "how did it do performance-wise in the 
context of this discussion," but never mind. Thanks! In a way there is a 
contradiction. RISC-V as a concept is about openness to everybody, while 
SiFive is naturally about itself ;-)

Cheers.