[PATCH 0/5] Kernel mode NEON for XOR and RAID6

Christopher Covington cov at codeaurora.org
Fri Jun 21 10:58:21 EDT 2013


Hi Ard,

On 06/21/2013 06:08 AM, Ard Biesheuvel wrote:
> On 21 June 2013 11:33, Will Deacon <will.deacon at arm.com> wrote:
>> On Sat, Jun 08, 2013 at 04:09:56AM +0100, Nicolas Pitre wrote:
>>> On Fri, 7 Jun 2013, Will Deacon wrote:
>>>> What's the earliest toolchain we claim to support nowadays? If that can't
>>>> deal with the intrinsics then we either need to bump the requirement, or
>>>> write this using hand-coded asm. In the case of the latter, I don't think
>>>> the maintenance overhead of having two implementations is worth it.
>>>
>>> We have many different minimum toolchain version requirements attached
>>> to different features being enabled already, ftrace being one of them if
>>> I remember correctly.  For these Neon optimizations the minimum gcc
>>> version is v4.6.
>>>
>>> Given that this is going to be interesting mostly to server systems, and
>>> given that ARM server deployments are rather new, I don't see the point
>>> of compiling a new server environment using an older gcc version.
>>
>> I've mulled over this, had some discussions with our toolchain guys and
>> have concluded the following:
>>
>>   - The intrinsics are actually ok. I was sceptical at first, but I've been
>>     assured that they should do a reasonable job (echoing your performance
>>     figures).
>>
>>   - The current approach is targetting servers and isn't (yet) suitable for
>>     mobile.
>>
>> So, given that the patches do the right thing wrt GCC version, the only
>> remaining point is that we need to keep an eye out for people trying to
>> re-use this stuff for mobile (likely crypto, as I mentioned earlier). When
>> that happens, we should consider revisiting the benchmark/power figures.
>>
> 
> OK, so a number of points have been raised in this discussion, let me
> address them one by one:
> 
> Should we allow NEON to be used in the kernel?
> 
> The consensus is not to allow floating point. However, NEON is
> different, as the performance gains are considerable and there is no
> dependency on support code, which makes it not as hairy as
> conventional (pre-v3) VFP. Also, managing the vfpstates is easily
> doable if NEON is only used outside interrupt context and with
> preemption disabled.
> 
> 
> Does my series implement it correctly?
> 
> I have addressed Russell's first round of comments. Happy to take
> another round if necessary.
> 
> 
> Should we allow NEON intrinsics in the kernel?
> Should we allow GCC-generated NEON in the kernel?
> 
> Only if the implementation is clear on which minimum version of GCC it
> requires. We could use my examples to set a precedent on what is a
> suitable way to use NEON intrinsics or the vectorizer in kernel code
> (which includes coding it such that it can be reused for arm64 with no
> modifications)
> 
> 
> Is kernel mode NEON suitable for mobile?
> 
> To me, it is unclear why kernel and userland are so different in this
> respect. However, kernel mode NEON is separately configurable from
> Kconfig so it can be disabled at will.
> 
> 
> Is there a point to doing a boot time benchmark to select the optimal
> implementation of an algorithm?
> 
> Perhaps not but unrelated to kernel mode NEON.

If this is indeed the consensus (I don't disagree with any of it myself),
perhaps committing the main points, guidelines, and examples to
Documentation/arm/* would be useful.

Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.



More information about the linux-arm-kernel mailing list