[PATCH v4] crypto: riscv/poly1305 - import OpenSSL/CRYPTOGAMS implementation

Tue Jun 24 20:54:46 PDT 2025

On Tue, Jun 24, 2025 at 11:13:49AM +0200, Andy Polyakov wrote:
> > > +.globl	poly1305_init
> > > +.type	poly1305_init,\@function
> > > +poly1305_init:
> > > +#ifdef	__riscv_zicfilp
> > > +	lpad	0
> > > +#endif
> > 
> > The 'lpad' instructions aren't present in the upstream CRYPTOGAMS source.
> 
> They are.

I now see the latest version does have them.  However, the description of this
patch explicitly states it was taken from CRYPTOGAMS commit
33fe84bc21219a16825459b37c825bf4580a0a7b.  Which is of course the one I looked
at, and it did not have them.  So the patch description is wrong.

> > If they are necessary, this addition needs to be documented.
> > 
> > But they appear to be unnecessary.
> 
> They are better be there if Control Flow Integrity is on. It's the same deal
> as with endbranch instruction on Intel and hint #34 on ARM. It's possible
> that the kernel never engages CFI for itself, in which case all the
> mentioned instructions are executed as nop-s. But note that here they are
> compiled conditionally, so that if you don't compile the kernel with
> -march=..._zicfilp_..., then they won't be there.

There appears to be no kernel-mode support for Zicfilp yet.  This would be the
very first occurrence of the lpad instruction in the kernel source.

Keeping this anyway is fine if it's in the upstream version.  It just appeared
to be an undocumented change from the upstream version...

> > > +#ifndef	__CHERI_PURE_CAPABILITY__
> > > +	andi	$tmp0,$inp,7		# $inp % 8
> > > +	andi	$inp,$inp,-8		# align $inp
> > > +	slli	$tmp0,$tmp0,3		# byte to bit offset
> > > +#endif
> > > +	ld	$in0,0($inp)
> > > +	ld	$in1,8($inp)
> > > +#ifndef	__CHERI_PURE_CAPABILITY__
> > > +	beqz	$tmp0,.Laligned_key
> > > +
> > > +	ld	$tmp2,16($inp)
> > > +	neg	$tmp1,$tmp0		# implicit &63 in sll
> > > +	srl	$in0,$in0,$tmp0
> > > +	sll	$tmp3,$in1,$tmp1
> > > +	srl	$in1,$in1,$tmp0
> > > +	sll	$tmp2,$tmp2,$tmp1
> > > +	or	$in0,$in0,$tmp3
> > > +	or	$in1,$in1,$tmp2
> > > +
> > > +.Laligned_key:
> > 
> > This code is going through a lot of trouble to work on RISC-V CPUs that don't
> > support efficient misaligned memory accesses.  That includes issuing loads of
> > memory outside the bounds of the given buffer, which is questionable (even if
> > it's guaranteed to not cross a page boundary).
> 
> It's indeed guaranteed to not cross a page *nor* even cache-line boundaries.
> Hence they can't trigger any externally observed side effects the
> corresponding unaligned loads won't. What is the concern otherwise? [Do note
> that the boundaries are not crossed on a boundary-enforcable CHERI platform
> ;-)]

With this, we get:

- More complex code.
- Slower on CPUs that do support efficient misaligned accesses.
- The buffer underflows and overflows could cause problems with future CPU
  behavior.  (Did you consider the palette memory extension, for example?)

That being said, if there will continue to be many RISC-V CPUs that don't
support efficient misaligned accesses, then we effectively need to do this
anyway.  I hoped that things might be developing along the lines of ARM, where
eventually misaligned accesses started being supported uniformly.  But perhaps
RISC-V is still in the process of learning that lesson.

> > The rest of the kernel's RISC-V crypto code, which is based on the vector
> > extension, just assumes that efficient misaligned memory accesses are supported.
> 
> Was it tested on real hardware though? I wonder what hardware is out there
> that supports the vector crypto extensions?

If I remember correctly, SiFive tested it on their hardware.

- Eric