[PATCH 0/3] riscv: word-at-a-time: improve find_zero()
Jisheng Zhang
jszhang at kernel.org
Tue Jan 13 04:24:54 PST 2026
Currently, there are two problems with riscv find_zero():
1. When !RISCV_ISA_ZBB, the generic fls64() bring non-optimal code.
But in word-at-a-time case, we don't have to go with fls64() code path,
instead, we can fallback to the generic word-at-a-time implementaion.
What's more, the fls64() brings non-necessary zero bits couting for
RV32. In fact, fls() is enough.
2. Similar as 1, the generic fls64() also brings non-optimal code when
RISCV_ISA_ZBB=y but HW doesn't support Zbb.
So this series tries to improve find_zero() by falling back to generic
word-at-a-time implementaion where necessary. We dramatically reduce
the instructions of find_zero() from 33 to 8! Also testing with the
micro-benchamrk in patch1 shows that the performance is improved by
about 1150%!
After that, we improve find_zero() for Zbb further by applying similar
optimization as Linus did in commit f915a3e5b018 ("arm64:
word-at-a-time: improve byte count calculations for LE"), so that
we share the similar improvements:
"The difference between the old and the new implementation is that
"count_zero()" ends up scheduling better because it is being done on a
value that is available earlier (before the final mask).
But more importantly, it can be implemented without the insane semantics
of the standard bit finding helpers that have the off-by-one issue and
have to special-case the zero mask situation."
On RV64 w/ Zbb, the new "find_zero()" ends up just "ctz" plus the shift
right that then ends up being subsumed by the "add to final length".
Reduce the total instructions from 7 to 3!
But I have no HW platform which supports Zbb, so I can't get the
performance improvement numbers by the last patch, only built and
tested the patch on QEMU.
Jisheng Zhang (3):
riscv: word-at-a-time: improve find_zero() for !RISCV_ISA_ZBB
riscv: word-at-a-time: improve find_zero() without Zbb
riscv: word-at-a-time: improve find_zero() for Zbb
arch/riscv/include/asm/word-at-a-time.h | 47 +++++++++++++++++++++++--
1 file changed, 44 insertions(+), 3 deletions(-)
--
2.51.0
More information about the linux-riscv
mailing list