[PATCH 7/8] arm64: Better optimised memchr()
Catalin Marinas
catalin.marinas at arm.com
Fri May 14 07:55:13 PDT 2021
On Tue, May 11, 2021 at 05:12:37PM +0100, Robin Murphy wrote:
> Although we implement our own assembly version of memchr(), it turns
> out to be barely any better than what GCC can generate for the generic
> C version (and would go wrong if the size_t argument were ever large
> enough to be interpreted as negative). Unfortunately we can't import the
> tuned implementation from the Arm optimized-routines library, since that
> has some Advanced SIMD parts which are not really viable for general
> kernel library code. What we can do, however, is pep things up with some
> relatively straightforward word-at-a-time logic for larger calls.
>
> Adding some timing to optimized-routines' memchr() test for a simple
> benchmark, overall this version comes in around half as fast as the SIMD
> code, but still nearly 4x faster than our existing implementation.
>
> Signed-off-by: Robin Murphy <robin.murphy at arm.com>
I haven't reviewed the code yet but wondering - could we write this in C
using load_unaligned_zeropad()?
--
Catalin
More information about the linux-arm-kernel
mailing list