[PATCH 4/5] arm64: lib: Use MOPS for memcpy() routines

Catalin Marinas catalin.marinas at arm.com
Thu Oct 17 04:57:22 PDT 2024


On Wed, Oct 16, 2024 at 02:08:27PM +0100, Kristina Martsenko wrote:
> On 04/10/2024 11:07, Catalin Marinas wrote:
> > On Thu, Oct 03, 2024 at 05:46:08PM +0100, Kristina Martsenko wrote:
> >> On 02/10/2024 16:29, Catalin Marinas wrote:
> >>> On Mon, Sep 30, 2024 at 05:10:50PM +0100, Kristina Martsenko wrote:
> >>>> diff --git a/arch/arm64/lib/memcpy.S b/arch/arm64/lib/memcpy.S
> >>>> index 4ab48d49c451..9b99106fb95f 100644
> >>>> --- a/arch/arm64/lib/memcpy.S
> >>>> +++ b/arch/arm64/lib/memcpy.S
> >>>> @@ -57,7 +57,7 @@
> >>>>     The loop tail is handled by always copying 64 bytes from the end.
> >>>>  */
> >>>>  
> >>>> -SYM_FUNC_START(__pi_memcpy)
> >>>> +SYM_FUNC_START_LOCAL(__pi_memcpy_generic)
> >>>>  	add	srcend, src, count
> >>>>  	add	dstend, dstin, count
> >>>>  	cmp	count, 128
> >>>> @@ -238,7 +238,24 @@ L(copy64_from_start):
> >>>>  	stp	B_l, B_h, [dstin, 16]
> >>>>  	stp	C_l, C_h, [dstin]
> >>>>  	ret
> >>>> +SYM_FUNC_END(__pi_memcpy_generic)
> >>>> +
> >>>> +#ifdef CONFIG_AS_HAS_MOPS
> >>>> +	.arch_extension mops
> >>>> +SYM_FUNC_START(__pi_memcpy)
> >>>> +alternative_if_not ARM64_HAS_MOPS
> >>>> +	b	__pi_memcpy_generic
> >>>> +alternative_else_nop_endif
> >>>
> >>> I'm fine with patching the branch but I wonder whether, for the time
> >>> being, we should use alternative_if instead and the NOP to fall through
> >>> the default implementation. The hardware in the field doesn't have
> >>> FEAT_MOPS yet and they may see a slight penalty introduced by the
> >>> branch, especially for small memcpys. Just guessing, I haven't done any
> >>> benchmarks.
> >>
> >> My thinking was that this way it doesn't have to be changed again in the
> >> future. But I'm fine with switching to alternative_if for v2.
> > 
> > The other option is to benchmark the proposed patches a bit and see if
> > we notice any difference on current hardware. Not sure exactly what
> > benchmarks would exercise these paths. For copy_page(), I suspect the
> > branch is probably lost in the noise. It's more like small copies that
> > might notice.
> > 
> > Yet another option is to leave the patches as they are and see if anyone
> > complains, we swap them over then ;).
> 
> I tried benchmarking a kernel build and hackbench on a Morello board (with
> usercopy patches applied as well) but didn't see any significant performance
> difference between the branch and NOP so I would leave the patches as they are.

That's great. Thanks for checking.

-- 
Catalin



More information about the linux-arm-kernel mailing list