[PATCH] riscv: lib: Optimize 'strlen' function

Sun Dec 17 09:00:48 PST 2023

From: Ivan Orlov
> Sent: 13 December 2023 15:46
> 
> The current non-ZBB implementation of 'strlen' function iterates the
> memory bytewise, looking for a zero byte. It could be optimized to use
> the wordwise iteration instead, so we will process 4/8 bytes of memory
> at a time.
...
> 1. If the address is unaligned, iterate SZREG - (address % SZREG) bytes
> to align it.

An alternative is to mask the address and 'or' in non-zero bytes
into the first word - might be faster.

...
> Here you can find the benchmarking results for the VisionFive2 board
> comparing the old and new implementations of the strlen function.
> 
> Size: 1 (+-0), mean_old: 673, mean_new: 666
> Size: 2 (+-0), mean_old: 672, mean_new: 676
> Size: 4 (+-0), mean_old: 685, mean_new: 659
> Size: 8 (+-0), mean_old: 682, mean_new: 673
> Size: 16 (+-0), mean_old: 718, mean_new: 694
...

Is that 32bit or 64bit?
The word-at-a-time strlen() is typically not worth it for 32bit.

I'd also guess that pretty much all the calls in-kernel are short.
You might try counting as: histogram[ilog2(strlen_result)]++
and seeing what it shows for some workload.
I bet you (a beer if I see you!) that you won't see many over 1k.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)