[PATCHv2 1/6] arm64: lib: Implement optimized memcpy routine

Fri May 9 07:13:09 PDT 2014

On Mon, Apr 28, 2014 at 06:11:29AM +0100, zhichang.yuan at linaro.org wrote:
> This patch, based on Linaro's Cortex Strings library, improves
> the performance of the assembly optimized memcpy() function.
[...]
> --- a/arch/arm64/lib/memcpy.S
> +++ b/arch/arm64/lib/memcpy.S
[...]
>  ENTRY(memcpy)
[...]
> +	mov	dst, dstin
> +	cmp	count, #16
> +	/*When memory length is less than 16, the accessed are not aligned.*/
> +	b.lo	.Ltiny15
> +
> +	neg	tmp2, src
> +	ands	tmp2, tmp2, #15/* Bytes to reach alignment. */
> +	b.eq	.LSrcAligned
> +	sub	count, count, tmp2

I started looking at this and comparing it to the original cortex
strings library. Is there any reason why at least the first part has
been rewritten? For example, the cortex strings starts with probably the
most likely case, comparing the count with 64.

-- 
Catalin