[PATCH 4/5] arm64: lib: Use MOPS for memcpy() routines
Kristina Martsenko
kristina.martsenko at arm.com
Wed Oct 16 06:08:27 PDT 2024
On 04/10/2024 11:07, Catalin Marinas wrote:
> On Thu, Oct 03, 2024 at 05:46:08PM +0100, Kristina Martsenko wrote:
>> On 02/10/2024 16:29, Catalin Marinas wrote:
>>> On Mon, Sep 30, 2024 at 05:10:50PM +0100, Kristina Martsenko wrote:
>>>> diff --git a/arch/arm64/lib/memcpy.S b/arch/arm64/lib/memcpy.S
>>>> index 4ab48d49c451..9b99106fb95f 100644
>>>> --- a/arch/arm64/lib/memcpy.S
>>>> +++ b/arch/arm64/lib/memcpy.S
>>>> @@ -57,7 +57,7 @@
>>>> The loop tail is handled by always copying 64 bytes from the end.
>>>> */
>>>>
>>>> -SYM_FUNC_START(__pi_memcpy)
>>>> +SYM_FUNC_START_LOCAL(__pi_memcpy_generic)
>>>> add srcend, src, count
>>>> add dstend, dstin, count
>>>> cmp count, 128
>>>> @@ -238,7 +238,24 @@ L(copy64_from_start):
>>>> stp B_l, B_h, [dstin, 16]
>>>> stp C_l, C_h, [dstin]
>>>> ret
>>>> +SYM_FUNC_END(__pi_memcpy_generic)
>>>> +
>>>> +#ifdef CONFIG_AS_HAS_MOPS
>>>> + .arch_extension mops
>>>> +SYM_FUNC_START(__pi_memcpy)
>>>> +alternative_if_not ARM64_HAS_MOPS
>>>> + b __pi_memcpy_generic
>>>> +alternative_else_nop_endif
>>>
>>> I'm fine with patching the branch but I wonder whether, for the time
>>> being, we should use alternative_if instead and the NOP to fall through
>>> the default implementation. The hardware in the field doesn't have
>>> FEAT_MOPS yet and they may see a slight penalty introduced by the
>>> branch, especially for small memcpys. Just guessing, I haven't done any
>>> benchmarks.
>>
>> My thinking was that this way it doesn't have to be changed again in the
>> future. But I'm fine with switching to alternative_if for v2.
>
> The other option is to benchmark the proposed patches a bit and see if
> we notice any difference on current hardware. Not sure exactly what
> benchmarks would exercise these paths. For copy_page(), I suspect the
> branch is probably lost in the noise. It's more like small copies that
> might notice.
>
> Yet another option is to leave the patches as they are and see if anyone
> complains, we swap them over then ;).
I tried benchmarking a kernel build and hackbench on a Morello board (with
usercopy patches applied as well) but didn't see any significant performance
difference between the branch and NOP so I would leave the patches as they are.
Thanks,
Kristina
More information about the linux-arm-kernel
mailing list