[PATCH V2] arm64: optimized copy_to_user and copy_from_user assembly code

zhichang.yuan zhichang.yuan at linaro.org
Tue Aug 12 20:13:12 PDT 2014


Hi Feng,

On 2014年08月12日 02:05, Feng Kan wrote:
> On Sun, Aug 10, 2014 at 8:01 PM, Radha Mohan <mohun106 at gmail.com> wrote:
>> Hi Feng,
>>
>>
>>> +
>>> +.Lcpy_not_short:
>>> +       /*
>>> +        * We don't much care about the alignment of DST, but we want SRC
>>> +        * to be 128-bit (16 byte) aligned so that we don't cross cache line
>>> +        * boundaries on both loads and stores.
>>> +        */
>> Could you please tell why is destination alignment not an issue? Is
>> this a generic implementation that you are referring to or specific to
>> your platform?
> This is per Linaro Cortext String optimization routines.
>
> https://launchpad.net/cortex-strings
>
> Zhichang submitted something similar for the memcpy from the
> same optimization.
>
> Sorry resend in text mode.

If the both dst and src are not aligned and their alignment offset are not equal, i haven't found better way
to handle.
But it is lucky ARMv8 support the non-align memory access.
At the beginning of my patch work, i also think maybe it is more better that all load or store are aligned. I
wrote the code just like the ARMv7 memcpy, firstly loaded the data from SRC and buffered them in several
registers and combined as a new word( 16 bytes), then stored it to the aligned DST. But the performance is a
bit worst.

~Zhichang

>>> --
>>> 1.9.1
>>>
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel




More information about the linux-arm-kernel mailing list