[PATCH v3 1/3] riscv: optimized memcpy
Matteo Croce
mcroce at linux.microsoft.com
Tue Jun 22 15:00:06 PDT 2021
On Mon, Jun 21, 2021 at 4:26 PM Christoph Hellwig <hch at infradead.org> wrote:
>
> On Thu, Jun 17, 2021 at 05:27:52PM +0200, Matteo Croce wrote:
> > +extern void *memcpy(void *dest, const void *src, size_t count);
> > +extern void *__memcpy(void *dest, const void *src, size_t count);
>
> No need for externs.
>
Right.
> > +++ b/arch/riscv/lib/string.c
>
> Nothing in her looks RISC-V specific. Why doesn't this go into lib/ so
> that other architectures can use it as well.
>
Technically it could go into lib/ and be generic.
If you think it's worth it, I have just to handle the different
left/right shift because of endianness.
> > +#include <linux/module.h>
>
> I think you only need export.h.
>
Nice.
> > +void *__memcpy(void *dest, const void *src, size_t count)
> > +{
> > + const int bytes_long = BITS_PER_LONG / 8;
> > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> > + const int mask = bytes_long - 1;
> > + const int distance = (src - dest) & mask;
> > +#endif
> > + union const_types s = { .u8 = src };
> > + union types d = { .u8 = dest };
> > +
> > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> > + if (count < MIN_THRESHOLD)
>
> Using IS_ENABLED we can avoid a lot of the mess in this
> function.
>
> int distance = 0;
>
> if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) {
> if (count < MIN_THRESHOLD)
> goto copy_remainder;
>
> /* copy a byte at time until destination is aligned */
> for (; count && d.uptr & mask; count--)
> *d.u8++ = *s.u8++;
> distance = (src - dest) & mask;
> }
>
Cool. What about putting this check in the very start:
if (count < MIN_THRESHOLD)
goto copy_remainder;
And since count is at least twice bytes_long, remove count from the check below?
Also, setting distance after d is aligned is as simple as getting the
lower bits of s:
if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) {
/* Copy a byte at time until destination is aligned. */
for (; d.uptr & mask; count--)
*d.u8++ = *s.u8++;
distance = s.uptr & mask;
}
> if (distance) {
> ...
>
> > + /* 32/64 bit wide copy from s to d.
> > + * d is aligned now but s is not, so read s alignment wise,
> > + * and do proper shift to get the right value.
> > + * Works only on Little Endian machines.
> > + */
>
> Normal kernel comment style always start with a:
>
Right, I was used to netdev ones :)
> /*
>
>
> > + for (next = s.ulong[0]; count >= bytes_long + mask; count -= bytes_long) {
>
> Please avoid the pointlessly overlong line. And (just as a matter of
> personal preference) I find for loop that don't actually use a single
> iterator rather confusing. Wjy not simply:
>
> next = s.ulong[0];
> while (count >= bytes_long + mask) {
> ...
> count -= bytes_long;
> }
My fault, in a previous version it was:
next = s.ulong[0];
for (; count >= bytes_long + mask; count -= bytes_long) {
So to have a single `count` counter for the loop.
Regards,
--
per aspera ad upstream
More information about the linux-riscv
mailing list