[PATCH v4] crypto: riscv/poly1305 - import OpenSSL/CRYPTOGAMS implementation
Andy Polyakov
appro at cryptogams.org
Tue Jun 24 02:13:49 PDT 2025
>> +.globl poly1305_init
>> +.type poly1305_init,\@function
>> +poly1305_init:
>> +#ifdef __riscv_zicfilp
>> + lpad 0
>> +#endif
>
> The 'lpad' instructions aren't present in the upstream CRYPTOGAMS source.
They are.
> If they are necessary, this addition needs to be documented.
>
> But they appear to be unnecessary.
They are better be there if Control Flow Integrity is on. It's the same
deal as with endbranch instruction on Intel and hint #34 on ARM. It's
possible that the kernel never engages CFI for itself, in which case all
the mentioned instructions are executed as nop-s. But note that here
they are compiled conditionally, so that if you don't compile the kernel
with -march=..._zicfilp_..., then they won't be there.
>> +#ifndef __CHERI_PURE_CAPABILITY__
>> + andi $tmp0,$inp,7 # $inp % 8
>> + andi $inp,$inp,-8 # align $inp
>> + slli $tmp0,$tmp0,3 # byte to bit offset
>> +#endif
>> + ld $in0,0($inp)
>> + ld $in1,8($inp)
>> +#ifndef __CHERI_PURE_CAPABILITY__
>> + beqz $tmp0,.Laligned_key
>> +
>> + ld $tmp2,16($inp)
>> + neg $tmp1,$tmp0 # implicit &63 in sll
>> + srl $in0,$in0,$tmp0
>> + sll $tmp3,$in1,$tmp1
>> + srl $in1,$in1,$tmp0
>> + sll $tmp2,$tmp2,$tmp1
>> + or $in0,$in0,$tmp3
>> + or $in1,$in1,$tmp2
>> +
>> +.Laligned_key:
>
> This code is going through a lot of trouble to work on RISC-V CPUs that don't
> support efficient misaligned memory accesses. That includes issuing loads of
> memory outside the bounds of the given buffer, which is questionable (even if
> it's guaranteed to not cross a page boundary).
It's indeed guaranteed to not cross a page *nor* even cache-line
boundaries. Hence they can't trigger any externally observed side
effects the corresponding unaligned loads won't. What is the concern
otherwise? [Do note that the boundaries are not crossed on a
boundary-enforcable CHERI platform ;-)]
> The rest of the kernel's RISC-V crypto code, which is based on the vector
> extension, just assumes that efficient misaligned memory accesses are supported.
Was it tested on real hardware though? I wonder what hardware is out
there that supports the vector crypto extensions?
> On a related topic, if this patch is accepted, the result will be inconsistent
> optimization of ChaCha vs. Poly1305, which are usually paired:
https://github.com/dot-asm/cryptogams/blob/master/riscv/chacha-riscv.pl
> (1) ChaCha optimized with the RISC-V vector extension
> (2) Poly1305 optimized with RISC-V scalar instructions
>
> Surely a RISC-V vector extension optimized Poly1305 is going to be needed too?
I'm a "test-on-hardware" guy. I've got Spacemit X60, which has a working
256-bit base vector implementation. I have a "teaser" Chacha vector
implementation that currently performs *worse* than scalar one, more
than twice worse. Working on improving it. For reference. One has to
recognize that cryptographic algorithms customarily have short
dependencies, which means that performance is dominated by instruction
latencies. There might or might not be ways to match the scalar
performance. Or course, even if it turns out to be impossible on this
processor, it doesn't mean that it won't make sense to keep the vector
implementation, because other processors might do better. In other
words, it's coming...
> But with that being the case, will a RISC-V scalar optimized Poly1305 actually
> be worthwhile to add too? Especially without optimized ChaCha alongside it?
Yes. Because vector implementations are inefficient on short inputs and
having a compatible scalar fall-back for short inputs is more than
appropriate. In other words starting with scalar implementations is a
sensible and perfectly meaningful step.
Cheers.
More information about the linux-riscv
mailing list