[PATCH] riscv: lib: Optimize 'strlen' function

David Laight David.Laight at ACULAB.COM
Sun Dec 17 10:10:54 PST 2023


From: Ivan Orlov
> Sent: 13 December 2023 15:46

Looking at the old code...

>  1:
> -	lbu	t0, 0(t1)
> -	beqz	t0, 2f
> -	addi	t1, t1, 1
> -	j	1b

I suspect there is (at least) a two clock stall between
the 'ldu' and 'beqz'.
Allowing for one clock for the 'predicted taken' branch
that is 7 clocks/byte.

Try this one - especially on 32bit:

	mov	t0, a0
	and	t1, t0, 1
	sub	t0, t0, t1
	bnez	t1, 2f
1:
	ldb	t1, 0(t0)
2:	ldb	t2, 1(t0)
	add	t0, t0, 2
	beqz	t1, 3f
	bnez	t2, 1b
	add	t0, t0, 1
3:	sub	t0, t0, 2
	sub	a0, t0, a0
	ret

Might be 6 clocks for 2 bytes.
The much smaller cache footprint will also help.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)




More information about the linux-riscv mailing list