Call for testing/opinions: Optimized memset/memcpy
Ard Biesheuvel
ard.biesheuvel at linaro.org
Sun Jul 14 07:37:44 EDT 2013
On 14 July 2013 13:19, Harm Hanemaaijer <fgenfb at yahoo.com> wrote:
> Dr. David Alan Gilbert <gilbertd <at> treblig.org> writes:
>>
>> Maybe neon is worth a try these days (although be careful of platforms
>> like Tegra 2 that doens't have it); there was a recent patch that enabled
>> use in the kernel (I think for some RAID use). The downside is it's
>> supposed to be quite power hungry.
>>
>
> As it turns out, NEON isn't too hard to implement. I have added NEON support
> to copy_page, memset, memzero, and memcpy (both for the aligned and unaligned
> case) in my userspace testing environment. It gives a nice boost (ranging
> from 10% for copy_page to >30% for unaligned memcpy on a Cortex A8), which
> can potentially be more on other cores. Although I have not tested a live
> kernel yet, it looks like NEON can be used fairly transparently #ifdefed on
> the CONFIG_NEON kernel definition as long as only the lower end of the
> NEON/vfp register file is clobbered (although this needs verification).
>
You will clobber the userland NEON contents of the register file if
you don't preserve them properly. Also, kernel preemption (if enabled)
may put your task to sleep at any time, and the context switching
machinery is totally oblivious of NEON being used in the kernel, so
the kernel side will get corrupted as well in this case.
I have a patch series pending (i.e., accepted but not pulled yet by
Russell) which addresses these issues.
--
Ard.
More information about the linux-arm-kernel
mailing list