[PATCH v3 1/3] riscv: optimized memcpy

Mon Jun 21 07:26:31 PDT 2021

On Thu, Jun 17, 2021 at 05:27:52PM +0200, Matteo Croce wrote:
> +extern void *memcpy(void *dest, const void *src, size_t count);
> +extern void *__memcpy(void *dest, const void *src, size_t count);

No need for externs.

> +++ b/arch/riscv/lib/string.c

Nothing in her looks RISC-V specific.  Why doesn't this go into lib/ so
that other architectures can use it as well.

> +#include <linux/module.h>

I think you only need export.h.

> +void *__memcpy(void *dest, const void *src, size_t count)
> +{
> +	const int bytes_long = BITS_PER_LONG / 8;
> +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +	const int mask = bytes_long - 1;
> +	const int distance = (src - dest) & mask;
> +#endif
> +	union const_types s = { .u8 = src };
> +	union types d = { .u8 = dest };
> +
> +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +	if (count < MIN_THRESHOLD)

Using IS_ENABLED we can avoid a lot of the mess in this
function.

	int distance = 0;

	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) {
		if (count < MIN_THRESHOLD)
			goto copy_remainder;

		/* copy a byte at time until destination is aligned */
		for (; count && d.uptr & mask; count--)
			*d.u8++ = *s.u8++;
		distance = (src - dest) & mask;
	}

	if (distance) {
		...

> +		/* 32/64 bit wide copy from s to d.
> +		 * d is aligned now but s is not, so read s alignment wise,
> +		 * and do proper shift to get the right value.
> +		 * Works only on Little Endian machines.
> +		 */

Normal kernel comment style always start with a:

		/*

> +		for (next = s.ulong[0]; count >= bytes_long + mask; count -= bytes_long) {

Please avoid the pointlessly overlong line.  And (just as a matter of
personal preference) I find for loop that don't actually use a single
iterator rather confusing.  Wjy not simply:

		next = s.ulong[0];
		while (count >= bytes_long + mask) {
			...
			count -= bytes_long;
		}