[RFC] arm: use built-in byte swap function

Thu Feb 7 05:43:03 EST 2013

On 7 February 2013 10:19, Will Newton <will.newton at gmail.com> wrote:
> On Thu, Feb 7, 2013 at 1:19 AM, Kim Phillips <kim.phillips at freescale.com> wrote:
>> On Wed, 6 Feb 2013 09:02:04 +0000
>> "Woodhouse, David" <david.woodhouse at intel.com> wrote:
>>
>>> On Tue, 2013-02-05 at 21:04 -0600, Kim Phillips wrote:
>>> > gcc -Os emits calls to __bswapsi2 on those platforms to save space
>>> > because they don't have the single rev byte swap instruction.
>>>
>>> Is that the right thing for GCC to do in that situation?
>>
>> if it saves space, why wouldn't it be?
>>
>> "Many of these functions are only optimized in certain cases; if they
>> are not optimized in a particular case, a call to the library
>> function is emitted." [1]
>>
>> I see "(arm_arch6 || !optimize_size)" in gcc's define_expand
>> "bswapsi2" source, so GCC considers size optimization as a
>> legitimate one of those cases.
>>
>>> If so, perhaps we should be *providing* __bswap[sd]i2 functions for it
>>> to use?
>>
>> either that, or link with libgcc - why does arch/arm64 do this and
>> arch/arm not?  It's not obvious from git log.
>
> One reason I have found, I don't know if it is the canonical one, is
> that linking with libgcc allows people to use all intrinsics e.g. soft
> float routines in the kernel without noticing it. If you limit the
> intrinsics to the ones linked into the kernel explicitly then this
> cannot happen.

For arm64 we explicitly pass -mgeneral-regs-only to avoid any floating
point generation. Soft-float is excluded by the ABI automatically. But
we use other compiler intrinsics like __ffs and while they are
currently generated inline, you can't guarantee, hence the linking
with libgcc.

> I have also seen cases where the libgcc intrinsics are improved over
> time, having the code in the kernel allows these improvements to be
> rolled into the kernel even if the user has an older toolchain.

Indeed, the gcc guys do a lot benchmarking/optimisations on a wide
range of processors, so we can take advantage of that in the kernel.
But it's much easier on arm64 since the architecture is stable. On
32-bit arm we have to cope with a range of architecture versions with
variations to the instruction set.

-- 
Catalin