[PATCH 3/3] riscv: optimized memset

David Laight David.Laight at ACULAB.COM
Thu Feb 1 15:04:48 PST 2024


...
> > +		/* Compose an ulong with 'c' repeated 4/8 times */
> > +#ifdef CONFIG_ARCH_HAS_FAST_MULTIPLIER
> > +		cu *= 0x0101010101010101UL;

That it likely to generate a compile error on 32bit.
Maybe:
		cu *= (unsigned long)0x0101010101010101ULL;
> > +#else
> > +		cu |= cu << 8;
> > +		cu |= cu << 16;
> > +		/* Suppress warning on 32 bit machines */
> > +		cu |= (cu << 16) << 16;
> > +#endif
> 
> I guess you could check against __SIZEOF_LONG__ here.

Or even sizeof (cu), possible as:
		cu |= cu << (sizeof (cu) == 8 ? 32 : 0);
which I'm pretty sure modern compiler will throw away for 32bit.

I do wonder whether CONFIG_ARCH_HAS_FAST_MULTIPLIER is worth
testing - you'd really want to know there is a risc-v cpu
with a multiply that is slower than the shift and or version.
I actually doubt it.
Multiply is used so often (all array indexing) that you
really do need something better than a '1 bit per clock' loop.

It is worth remembering that you can implement an n*n multiply
with n*n 'full adders' (3 input bits, 2 output bits) with a
latency of 2*n adders.
So the latency is only twice that of the corresponding add.
For a modern chip that is not much logic at all.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


More information about the linux-riscv mailing list