[PATCH] ARM: force linker to use PIC veneers

Tue Mar 24 10:35:56 PDT 2015

On 24 March 2015 at 14:54, Dave Martin <Dave.Martin at arm.com> wrote:
> On Tue, Mar 24, 2015 at 01:50:40PM +0100, Ard Biesheuvel wrote:
>> On 24 March 2015 at 13:22, Dave Martin <Dave.Martin at arm.com> wrote:
>> > On Tue, Mar 24, 2015 at 11:16:24AM +0100, Ard Biesheuvel wrote:
>> >> When building a very large kernel, it is up to the linker to decide
>> >> when and where to insert stubs to allow calls to functions that are
>> >> out of range for the ordinary b/bl instructions.
>> >>
>> >> However, since the kernel is built as a position dependent binary,
>> >> these stubs (aka veneers) may contain absolute addresses, which will
>> >> break such veneer assisted far calls performed with the MMU off.
>> >>
>> >> For instance, the call from __enable_mmu() in the .head.text section
>> >> to __turn_mmu_on() in the .idmap.text section may be turned into
>> >> something like this:
>> >>
>> >> c0008168 <__enable_mmu>:
>> >> c0008168:       f020 0002       bic.w   r0, r0, #2
>> >> c000816c:       f420 5080       bic.w   r0, r0, #4096
>> >> c0008170:       f000 b846       b.w     c0008200 <____turn_mmu_on_veneer>
>> >> [...]
>> >> c0008200 <____turn_mmu_on_veneer>:
>> >> c0008200:       4778            bx      pc
>> >> c0008202:       46c0            nop
>> >> c0008204:       e59fc000        ldr     ip, [pc]
>> >> c0008208:       e12fff1c        bx      ip
>> >> c000820c:       c13dfae1        teqgt   sp, r1, ror #21
>> >> [...]
>> >> c13dfae0 <__turn_mmu_on>:
>> >> c13dfae0:       4600            mov     r0, r0
>> >> [...]
>> >>
>> >> After adding --pic-veneer to the LDFLAGS, the veneer is emitted like
>> >> this instead:
>> >>
>> >> c0008200 <____turn_mmu_on_veneer>:
>> >> c0008200:       4778            bx      pc
>> >> c0008202:       46c0            nop
>> >> c0008204:       e59fc004        ldr     ip, [pc, #4]
>> >> c0008208:       e08fc00c        add     ip, pc, ip
>> >> c000820c:       e12fff1c        bx      ip
>> >> c0008210:       013d7d31        teqeq   sp, r1, lsr sp
>> >> c0008214:       00000000        andeq   r0, r0, r0
>> >>
>> >> Note that this particular example is best addressed by moving
>> >> .head.text and .idmap.text closer together, but this issue could
>> >> potentially affect any code that needs to execute with the
>> >> MMU off.
>> >>
>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
>> >
>> > Although that fixes the problem, wouldn't this introduce extra potential
>> > overhead for every call in the kernel?
>> >
>>
>> It does not change whether a veneer is emitted or not, it only affects
>> the PIC nature of it.
>> So the overhead is 1 additional word for the add instruction, which I
>
> You're right, I misunderstood lightly what is going on there.
>
>> think is a small price to pay for correctness, especially considering
>> that someone building such a big kernel obviously does not optimize
>> for size.
>>
>> > How many such veneers get added in the your kernel configuration, and
>> > how many are actually necessary (i.e., calls between MMU-off code and
>> > elsewhere)?
>> >
>>
>> Very few. In addition to the example (which will be addressed in
>> another way regardless) there are some resume functions that get
>> allocated in .data, and those would need it as well. I have also
>> proposed b_far/bl_far macros that could be used there as well.
>>
>> The primary concern is that you can't really check whether any
>> problematic veneers have been emitted, unless all code that may run
>> with the MMU off is moved to the idmap.text section.
>
> That's a valid argument.
>
> Come to think of it, I can't think of a good reason why we don't
> pass --use-blx to the linker for THUMB2_KERNEL.  I think that would
> at least make these sequences a bit less painful by getting rid of
> the "bx pc" stuff.
>

Well, passing --use-blx doesn't seem to have the desired effect. I
still get these

c0181ba8 <___raw_spin_lock_veneer>:
c0181ba8:       4778            bx      pc
c0181baa:       46c0            nop                     ; (mov r8, r8)
[...]

$ size vmlinux
   text   data    bss    dec    hex filename
30038344 13868020 9613876 53520240 330a770 vmlinux

$ grep veneer System.map |wc -l
2211

Note that this is a Thumb2 kernel, and we may have some diminishing
returns here due to the reduced reach of the Thumb2 b/bl instructions.
Also, loading modules is going to be difficult without my PLT patch