[PATCH v12 4/5] riscv: Add checksum library

Wed Dec 20 02:28:04 PST 2023

> -----Original Message-----
> From: Charlie Jenkins <charlie at rivosinc.com>
> Sent: Wednesday, December 13, 2023 10:11 AM
> To: Palmer Dabbelt <palmer at dabbelt.com>; Conor Dooley
> <conor at kernel.org>; Samuel Holland <samuel.holland at sifive.com>; David
> Laight <David.Laight at aculab.com>; Wang, Xiao W <xiao.w.wang at intel.com>;
> Evan Green <evan at rivosinc.com>; linux-riscv at lists.infradead.org; linux-
> kernel at vger.kernel.org; linux-arch at vger.kernel.org
> Cc: Paul Walmsley <paul.walmsley at sifive.com>; Albert Ou
> <aou at eecs.berkeley.edu>; Arnd Bergmann <arnd at arndb.de>; Conor Dooley
> <conor.dooley at microchip.com>
> Subject: Re: [PATCH v12 4/5] riscv: Add checksum library
> 
> On Tue, Dec 12, 2023 at 05:18:41PM -0800, Charlie Jenkins wrote:
> > Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
> > will load from the buffer in groups of 32 bits, and when compiled for
> > 64-bit will load in groups of 64 bits.
> >
> > Additionally provide riscv optimized implementation of csum_ipv6_magic.
> >
> > Signed-off-by: Charlie Jenkins <charlie at rivosinc.com>
> > Acked-by: Conor Dooley <conor.dooley at microchip.com>
> > Reviewed-by: Xiao Wang <xiao.w.wang at intel.com>
> > ---
> >  arch/riscv/include/asm/checksum.h |  13 +-
> >  arch/riscv/lib/Makefile           |   1 +
> >  arch/riscv/lib/csum.c             | 326
> ++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 339 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/include/asm/checksum.h
> b/arch/riscv/include/asm/checksum.h
> > index 2fcf864186e7..3fa04ff1eda8 100644
> > --- a/arch/riscv/include/asm/checksum.h
> > +++ b/arch/riscv/include/asm/checksum.h
> > @@ -12,6 +12,17 @@
> >
> >  #define ip_fast_csum ip_fast_csum
> >
> > +extern unsigned int do_csum(const unsigned char *buff, int len);
> > +#define do_csum do_csum
> > +
> > +/* Default version is sufficient for 32 bit */
> > +#ifndef CONFIG_32BIT
> > +#define _HAVE_ARCH_IPV6_CSUM
> > +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
> > +			const struct in6_addr *daddr,
> > +			__u32 len, __u8 proto, __wsum sum);
> > +#endif
> > +
> >  /* Define riscv versions of functions before importing asm-
> generic/checksum.h */
> >  #include <asm-generic/checksum.h>
> >
> > @@ -69,7 +80,7 @@ static inline __sum16 ip_fast_csum(const void *iph,
> unsigned int ihl)
> >  			.option pop"
> >  			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
> >  		}
> > -		return csum >> 16;
> > +		return (__force __sum16) (csum >> 16);

I notice that this type conversion comes in after V10. This change should go to patch 3/5.

BRs,
Xiao

[...]
> > +
> > +/*
> > + * Perform a checksum on an arbitrary memory address.
> > + * Will do a light-weight address alignment if buff is misaligned, unless
> > + * cpu supports fast misaligned accesses.
> > + */
> > +unsigned int do_csum(const unsigned char *buff, int len)
> > +{
> > +	if (unlikely(len <= 0))
> > +		return 0;
> > +
> > +	/*
> > +	 * Significant performance gains can be seen by not doing alignment
> > +	 * on machines with fast misaligned accesses.
> > +	 *
> > +	 * There is some duplicate code between the "with_alignment" and
> > +	 * "no_alignment" implmentations, but the overlap is too awkward to
> be
> > +	 * able to fit in one function without introducing multiple static
> > +	 * branches. The largest chunk of overlap was delegated into the
> > +	 * do_csum_common function.
> > +	 */
> > +	if (static_branch_likely(&fast_misaligned_access_speed_key))
> > +		return do_csum_no_alignment(buff, len);
> > +
> > +	if (((unsigned long)buff & OFFSET_MASK) == 0)
> > +		return do_csum_no_alignment(buff, len);
> > +
> > +	return do_csum_with_alignment(buff, len);
> > +}
> >
> > --
> > 2.43.0
> >
> 
> There is potentially a code size concern here. These changes do require
> alternatives, and as such it increases the resulting binary size. The
> bloat-o-meter script reports that the do_csum function grows to twice
> the size with this patch:
> 
> Function                                     old     new   delta
> do_csum                                      238     514    +276
> 
> The other functions are harder to measure because they get inlined or
> are not included in generic code. However the do_csum is the most
> impacted because of the misaligned access behavior.
> 
> The performance improvements afforded by alternatives (with the Zbb
> extension) and with the misaligned access checking are significant. In
> my testing these optimizations alone contribute to over a 20% performance
> improvement.
> 
> - Charlie