[PATCH 7/8] arm64: Better optimised memchr()

Catalin Marinas catalin.marinas at arm.com
Fri May 14 07:55:13 PDT 2021


On Tue, May 11, 2021 at 05:12:37PM +0100, Robin Murphy wrote:
> Although we implement our own assembly version of memchr(), it turns
> out to be barely any better than what GCC can generate for the generic
> C version (and would go wrong if the size_t argument were ever large
> enough to be interpreted as negative). Unfortunately we can't import the
> tuned implementation from the Arm optimized-routines library, since that
> has some Advanced SIMD parts which are not really viable for general
> kernel library code. What we can do, however, is pep things up with some
> relatively straightforward word-at-a-time logic for larger calls.
> 
> Adding some timing to optimized-routines' memchr() test for a simple
> benchmark, overall this version comes in around half as fast as the SIMD
> code, but still nearly 4x faster than our existing implementation.
> 
> Signed-off-by: Robin Murphy <robin.murphy at arm.com>

I haven't reviewed the code yet but wondering - could we write this in C
using load_unaligned_zeropad()?

-- 
Catalin



More information about the linux-arm-kernel mailing list