[PATCH 2/2] ARM: move VFP init to an earlier boot stage

Thu May 23 05:57:21 EDT 2013

On 23 May 2013 10:41, Russell King - ARM Linux <linux at arm.linux.org.uk> wrote:
> On Thu, May 23, 2013 at 09:28:24AM +0200, Ard Biesheuvel wrote:
>> In order to use the NEON/VFP unit in the kernel, we should
>> initialize it a bit earlier in the boot process so NEON users
>> that like to do a quick benchmark at load time (like the
>> xor_blocks or RAID-6 code) find the NEON/VFP unit already
>> enabled.
>
> So what are the results of those benchmarks that the RAID-6 code does?
>

Output captured from a Cortex-A15 @ 1.7 GHz:

    xor: measuring software checksum speed
       arm4regs  :  2261.600 MB/sec
       8regs     :  1771.600 MB/sec
       32regs    :  1441.600 MB/sec
       neon      :  3619.600 MB/sec
    xor: using function: neon (3619.600 MB/sec)

Output captured from a Cortex-A15 @ 1.7 GHz:

    raid6: int32x1    200 MB/s
    raid6: int32x2    304 MB/s
    raid6: int32x4    388 MB/s
    raid6: int32x8    423 MB/s
    raid6: neonx1     799 MB/s
    raid6: neonx2    1364 MB/s
    raid6: neonx4    1731 MB/s
    raid6: neonx8    1676 MB/s
    raid6: using algorithm neonx4 (1731 MB/s)
    raid6: using intx1 recovery algorithm

I can put up both patches for RFC if you like.

> This isn't something which will get accepted without there being a very
> strong case for it; firstly, we've always said "no FP in the kernel".
> Secondly, it makes it too easy for people to start thinking that FP is
> safe in the kernel.  It isn't - this will go horribly wrong if the VFP
> hardware bounces an instruction to the support code.
>

That is clear. In the two cases I have been playing with, the
begin/end calls are in distinct compilation units from the actual
functions containing the NEON code. Perhaps we should enforce this by
conditionally turning kernel_vfp_begin() into a macro which errors out
(i.e., if __ARM_NEON__ is defined)?

> Lastly, what happens if you sleep between kernel_vfp_start() and
> kernel_vfp_end() ?  That's a context switch which will mess up your
> VFP state.  I think you need to take a spinlock (not _bh nor _irqsave
> versions) in here to ensure that people don't get the idea that they
> _can_ sleep between these two.
>

So generally, sleeping under preempt_disable() (or get cpu()) in this
case) is not currently handled in that way?
That is a misunderstanding on my part, I will address it in the code.

Regards,
-- 
Ard.

> So, good attempt, but I think you need to think about this more, and
> also justify the need for it.