[PATCH 7/8] arm64: Better optimised memchr()

Robin Murphy robin.murphy at arm.com
Fri May 14 11:38:31 PDT 2021


On 2021-05-14 15:55, Catalin Marinas wrote:
> On Tue, May 11, 2021 at 05:12:37PM +0100, Robin Murphy wrote:
>> Although we implement our own assembly version of memchr(), it turns
>> out to be barely any better than what GCC can generate for the generic
>> C version (and would go wrong if the size_t argument were ever large
>> enough to be interpreted as negative). Unfortunately we can't import the
>> tuned implementation from the Arm optimized-routines library, since that
>> has some Advanced SIMD parts which are not really viable for general
>> kernel library code. What we can do, however, is pep things up with some
>> relatively straightforward word-at-a-time logic for larger calls.
>>
>> Adding some timing to optimized-routines' memchr() test for a simple
>> benchmark, overall this version comes in around half as fast as the SIMD
>> code, but still nearly 4x faster than our existing implementation.
>>
>> Signed-off-by: Robin Murphy <robin.murphy at arm.com>
> 
> I haven't reviewed the code yet but wondering - could we write this in C
> using load_unaligned_zeropad()?

I've had a hack around with a couple of C implementations this 
afternoon, and they seem to come out roughly 85% as fast as this asm 
version. I'm not sure how much extra overhead load_unaligned_zeropad() 
would add with wiggling PSTATE.TCO all the time, though.

Robin.



More information about the linux-arm-kernel mailing list