[PATCH v4] crypto: riscv/poly1305 - import OpenSSL/CRYPTOGAMS implementation
Andy Polyakov
appro at cryptogams.org
Wed Jun 25 02:38:20 PDT 2025
>>>> +#ifndef __CHERI_PURE_CAPABILITY__
>>>> + andi $tmp0,$inp,7 # $inp % 8
>>>> + andi $inp,$inp,-8 # align $inp
>>>> + slli $tmp0,$tmp0,3 # byte to bit offset
>>>> +#endif
>>>> + ld $in0,0($inp)
>>>> + ld $in1,8($inp)
>>>> +#ifndef __CHERI_PURE_CAPABILITY__
>>>> + beqz $tmp0,.Laligned_key
>>>> +
>>>> + ld $tmp2,16($inp)
>>>> + neg $tmp1,$tmp0 # implicit &63 in sll
>>>> + srl $in0,$in0,$tmp0
>>>> + sll $tmp3,$in1,$tmp1
>>>> + srl $in1,$in1,$tmp0
>>>> + sll $tmp2,$tmp2,$tmp1
>>>> + or $in0,$in0,$tmp3
>>>> + or $in1,$in1,$tmp2
>>>> +
>>>> +.Laligned_key:
>>>
>>> This code is going through a lot of trouble to work on RISC-V CPUs that don't
>>> support efficient misaligned memory accesses. That includes issuing loads of
>>> memory outside the bounds of the given buffer, which is questionable (even if
>>> it's guaranteed to not cross a page boundary).
>>
>> It's indeed guaranteed to not cross a page *nor* even cache-line boundaries.
>> Hence they can't trigger any externally observed side effects the
>> corresponding unaligned loads won't. What is the concern otherwise? [Do note
>> that the boundaries are not crossed on a boundary-enforcable CHERI platform
>> ;-)]
>
> With this, we get:
>
> - More complex code.
My rationale is as follows. It's beneficial to have this code to cover
the whole spectrum of processor implementations. I for one would even
say it's important, because penalties on processors that can't handle
misaligned access efficiently are just too high to ignore. Now, it's
possible to bypass it with #ifdef-s (as done for CHERI), but to make
things less confusing, a.k.a. *less* complex, it's preferred to rely on
the compiler predefines (as done for CHERI). Later compiler versions
introduced apparently suitable predefines for this,
__riscv_misaligned_slow/fast/avoid. However, as of the moment of this
writing the macros in question don't seem to depend on the -mcpu
parameter. But it's probably reasonable to assume that they will at a
later point. So the suggestion would be to use these. Does it sound
reasonable? Or would you insist on a custom macro that would need to be
set depending on CONFIG_RISCV_EFFICIENT_UNALIGNED_ACCESS?
> - Slower on CPUs that do support efficient misaligned accesses.
With arguably marginal penalty, as discussed in the previous message. In
the context one can also view it as a trade-off between a small penalty
and increased #ifdef spaghetti :-)
> - The buffer underflows and overflows could cause problems with future CPU
> behavior. (Did you consider the palette memory extension, for example?)
Pallette memory extension colours fixed-size, hence accordingly aligned,
blocks. Since the block size is larger than the word load size, any
aligned load would be safe, because even a single "excess" or "short"
byte would colour the whole block accordingly.
Just in case to be clear. The argument is about loads. Misaligned stores
is naturally different matter and it would be inappropriate to handle
them in the similar manner.
> That being said, if there will continue to be many RISC-V CPUs that don't
> support efficient misaligned accesses, then we effectively need to do this
> anyway. I hoped that things might be developing along the lines of ARM, where
> eventually misaligned accesses started being supported uniformly. But perhaps
> RISC-V is still in the process of learning that lesson.
One has to recognize that it can also be a matter of cost. I mean
imagine you want to license the least expensive IP from SiFive, or have
very limited space for MCU. Well, Linux, naturally having higher minimum
requirements, doesn't have to care about these, but it doesn't mean that
nobody would :-)
>>> The rest of the kernel's RISC-V crypto code, which is based on the vector
>>> extension, just assumes that efficient misaligned memory accesses are supported.
>>
>> Was it tested on real hardware though? I wonder what hardware is out there
>> that supports the vector crypto extensions?
>
> If I remember correctly, SiFive tested it on their hardware.
Cool! The question was rather "how did it do performance-wise in the
context of this discussion," but never mind. Thanks! In a way there is a
contradiction. RISC-V as a concept is about openness to everybody, while
SiFive is naturally about itself ;-)
Cheers.
More information about the linux-riscv
mailing list