[PATCH] riscv: lib: Optimize 'strlen' function
Ivan Orlov
ivan.orlov0322 at gmail.com
Mon Dec 18 02:03:21 PST 2023
On 12/18/23 09:20, David Laight wrote:
> From: Ivan Orlov
>> Sent: 18 December 2023 01:42
>>
>> On 12/17/23 17:00, David Laight wrote:
>>> I'd also guess that pretty much all the calls in-kernel are short.
>>> You might try counting as: histogram[ilog2(strlen_result)]++
>>> and seeing what it shows for some workload.
>>> I bet you (a beer if I see you!) that you won't see many over 1k.
>>
>> Hi David,
>>
>> Here is the statistics for strlen result:
>>
>> [ 223.169575] Calls count for 2^0: 6150
>> [ 223.173293] Calls count for 2^1: 184852
>> [ 223.177142] Calls count for 2^2: 313896
>> [ 223.180990] Calls count for 2^3: 185844
>> [ 223.184881] Calls count for 2^4: 87868
>> [ 223.188660] Calls count for 2^5: 9916
>> [ 223.192368] Calls count for 2^6: 1865
>> [ 223.196062] Calls count for 2^7: 0
>> [ 223.199483] Calls count for 2^8: 0
>> [ 223.202952] Calls count for 2^9: 0
>> ...
>>
>> Looks like I've just lost a beer :)
>>
>> Considering this statistics, I'd say implementing the word-oriented
>> strlen is an overcomplication - we wouldn't get any performance gain and
>> it just doesn't worth it.
>
> And the 32bit version is about half the speed of the 64bit one.
>
> Of course, the fast way to do strlen is add a custom instruction!
>
>> I simplified your code a little bit, it looks like the alignment there
>> is unnecessary: QEMU test shows the same performance independently from
>> alignment. Tests on the board gave the same result (perhaps because the
>> CPU on the board has 2 DDR channels?)
>
> The alignment is there because it can overread the string end
> by one byte - and that mustn't cross a page boundary.
> So you either have to mark the second load as 'may fault return
> zero' or just not do it.
>
> If the data isn't in cache the cache load will dominate.
> The DDR channels only affect cache load times.
> Get a TLB miss and add a few thousand more clocks!
>
Ah, right, sounds reasonable...
Overall, I believe your solution is better and it would be more fair if
you send it as a patch :) Here is benchmark results for your version vs
the original (the old) one on the Starfive VisionFive2 RISC-V board:
Size: 1 (+-0), mean_old: 350, mean_new: 340
Size: 2 (+-0), mean_old: 337, mean_new: 347
Size: 4 (+-0), mean_old: 322, mean_new: 355
Size: 8 (+-0), mean_old: 345, mean_new: 335
Size: 16 (+-0), mean_old: 352, mean_new: 367
Size: 32 (+-0), mean_old: 425, mean_new: 362
Size: 64 (+-4), mean_old: 507, mean_new: 407
Size: 128 (+-10), mean_old: 730, mean_new: 442
Size: 256 (+-19), mean_old: 1142, mean_new: 592
Size: 512 (+-6), mean_old: 1945, mean_new: 812
Size: 1024 (+-21), mean_old: 3565, mean_new: 1312
Size: 2048 (+-108), mean_old: 6812, mean_new: 2280
Size: 4096 (+-362), mean_old: 13302, mean_new: 4242
Size: 8192 (+-385), mean_old: 26393, mean_new: 8160
Size: 16384 (+-1115), mean_old: 52689, mean_new: 15953
Size: 32768 (+-2515), mean_old: 107293, mean_new: 32391
Size: 65536 (+-6041), mean_old: 213789, mean_new: 74354
Size: 131072 (+-12352), mean_old: 426619, mean_new: 146972
Size: 262144 (+-2635), mean_old: 848115, mean_new: 291309
Size: 524288 (+-3336), mean_old: 1712847, mean_new: 589654
>>
>> --
>> Kind regards,
>> Ivan Orlov
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
--
Kind regards,
Ivan Orlov
More information about the linux-riscv
mailing list