[PATCH] riscv: lib: Optimize 'strlen' function
David Laight
David.Laight at ACULAB.COM
Sun Dec 17 10:10:54 PST 2023
From: Ivan Orlov
> Sent: 13 December 2023 15:46
Looking at the old code...
> 1:
> - lbu t0, 0(t1)
> - beqz t0, 2f
> - addi t1, t1, 1
> - j 1b
I suspect there is (at least) a two clock stall between
the 'ldu' and 'beqz'.
Allowing for one clock for the 'predicted taken' branch
that is 7 clocks/byte.
Try this one - especially on 32bit:
mov t0, a0
and t1, t0, 1
sub t0, t0, t1
bnez t1, 2f
1:
ldb t1, 0(t0)
2: ldb t2, 1(t0)
add t0, t0, 2
beqz t1, 3f
bnez t2, 1b
add t0, t0, 1
3: sub t0, t0, 2
sub a0, t0, a0
ret
Might be 6 clocks for 2 bytes.
The much smaller cache footprint will also help.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
More information about the linux-riscv
mailing list