[PATCH v3 1/3] riscv: optimized memcpy

Matteo Croce mcroce at linux.microsoft.com
Tue Jun 22 16:35:58 PDT 2021


On Tue, Jun 22, 2021 at 2:15 AM Nick Kossifidis <mick at ics.forth.gr> wrote:
>
> Hello Matteo and thanks for the patch,
>
> Στις 2021-06-17 18:27, Matteo Croce έγραψε:
> >
> > @@ -0,0 +1,91 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * String functions optimized for hardware which doesn't
> > + * handle unaligned memory accesses efficiently.
> > + *
> > + * Copyright (C) 2021 Matteo Croce
> > + */
> > +
> > +#include <linux/types.h>
> > +#include <linux/module.h>
> > +
> > +/* Minimum size for a word copy to be convenient */
> > +#define MIN_THRESHOLD (BITS_PER_LONG / 8 * 2)
> > +
> > +/* convenience union to avoid cast between different pointer types */
> > +union types {
> > +     u8 *u8;
>
> You are using a type as a name, I'd go with as_bytes/as_ulong/as_uptr
> which makes it easier for the reader to understand what you are trying
> to do.
>

Makes sense

> > +     unsigned long *ulong;
> > +     uintptr_t uptr;
> > +};
> > +
> > +union const_types {
> > +     const u8 *u8;
> > +     unsigned long *ulong;
> > +};
> > +
>
> I suggest you define those unions inside the function body, no one else
> is using them.
>

They will be used in memset(), in patch 3/3

> > +void *__memcpy(void *dest, const void *src, size_t count)
> > +{
> > +     const int bytes_long = BITS_PER_LONG / 8;
> > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> > +     const int mask = bytes_long - 1;
> > +     const int distance = (src - dest) & mask;
>
> Why not unsigned ints ?
>

Ok.

> > +#endif
> > +     union const_types s = { .u8 = src };
> > +     union types d = { .u8 = dest };
> > +
> > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>
> If you want to be compliant with memcpy you should check for overlapping
> regions here since "The memory areas must not overlap", and do nothing
> about it because according to POSIX this leads to undefined behavior.
> That's why recent libc implementations use memmove in any case (memcpy
> is an alias to memmove), which is the suggested approach.
>

Mmm which memcpy arch implementation does this check?
I guess that noone is currently doing it.

> > +     if (count < MIN_THRESHOLD)
> > +             goto copy_remainder;
> > +
> > +     /* copy a byte at time until destination is aligned */
> > +     for (; count && d.uptr & mask; count--)
> > +             *d.u8++ = *s.u8++;
> > +
>
> You should check for !IS_ENABLED(CONFIG_CPU_BIG_ENDIAN) here.
>

I tought that only Little Endian RISC-V machines were supported in Linux.
Should I add a BUILD_BUG_ON()?
Anyway, if this is going in generic lib/, I'll take care of the endianness.

> > +     if (distance) {
> > +             unsigned long last, next;
> > +
> > +             /* move s backward to the previous alignment boundary */
> > +             s.u8 -= distance;
>
> It'd help here to explain that since s is distance bytes ahead relative
> to d, and d reached the alignment boundary above, s is now aligned but
> the data needs to be shifted to compensate for distance, in order to do
> word-by-word copy.
>
> > +
> > +             /* 32/64 bit wide copy from s to d.
> > +              * d is aligned now but s is not, so read s alignment wise,
> > +              * and do proper shift to get the right value.
> > +              * Works only on Little Endian machines.
> > +              */
>
> This commend is misleading because s is aligned or else s.ulong[0]/[1]
> below would result an unaligned access.
>

Yes, those two comments should be rephrased, merged and put above.

> > +             for (next = s.ulong[0]; count >= bytes_long + mask; count -=
> > bytes_long) {
> > +                     last = next;
> > +                     next = s.ulong[1];
> > +
> > +                     d.ulong[0] = last >> (distance * 8) |
> > +                                  next << ((bytes_long - distance) * 8);
> > +
> > +                     d.ulong++;
> > +                     s.ulong++;
> > +             }
> > +
> > +             /* restore s with the original offset */
> > +             s.u8 += distance;
> > +     } else
> > +#endif
> > +     {
> > +             /* if the source and dest lower bits are the same, do a simple
> > +              * 32/64 bit wide copy.
> > +              */
>
> A while() loop would make more sense here.
>

Ok.

> > +             for (; count >= bytes_long; count -= bytes_long)
> > +                     *d.ulong++ = *s.ulong++;
> > +     }
> > +
> > +     /* suppress warning when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y */
> > +     goto copy_remainder;
> > +
> > +copy_remainder:
> > +     while (count--)
> > +             *d.u8++ = *s.u8++;
> > +
> > +     return dest;
> > +}
> > +EXPORT_SYMBOL(__memcpy);
> > +
> > +void *memcpy(void *dest, const void *src, size_t count) __weak
> > __alias(__memcpy);
> > +EXPORT_SYMBOL(memcpy);

Regards,
-- 
per aspera ad upstream



More information about the linux-riscv mailing list