[RFC PATCH] arm: decompressor: initialize PIC offset base register for uClinux tools

Nicolas Pitre nico at fluxnic.net
Fri Feb 1 13:07:51 EST 2013


On Fri, 1 Feb 2013, Jonathan Austin wrote:

> Hi Nicolas, thanks for the comments,
> 
> On 29/01/13 20:13, Nicolas Pitre wrote:
> > On Tue, 29 Jan 2013, Jonathan Austin wrote:
> > 
> >> Before jumping to (position independent) C-code from the decompressor's
> >> assembler world we set-up the C environment. This setup currently does not
> >> set r9, which for arm-none-uclinux-uclibceabi should be the PIC offset base
> >> register (IE should point to the beginning of the GOT).
> >>
> >> Currently, therefore, in order to build working kernels that use the
> >> decompressor it is necessary to use an arm-linux-gnueabi toolchain, or
> >> similar. uClinux toolchains cause a Prefetch Abort to occur at the beginning
> >> of the decompress_kernel function.
> >>
> >> This patch allows uClinux toolchains to build bootable zImages by setting r9
> >> to the beginning of the GOT when __uClinux__ is defined, allowing the
> >> decompressor's C functions to work correctly.
> >>
> >> Signed-off-by: Jonathan Austin <jonathan.austin at arm.com>
> >> ---
> >>
> >> One other possibility would be to specify -mno-single-pic-base when building
> >> the decompressor. This works around the problem, but forces the compiler to
> >> generate less optimal code.
> > 
> > How "less optimal"?  How much bigger/slower is it?
> > If not significant enough then going with -mno-single-pic-base might be
> > fine.
> 
> Code that needs to access anything global will need to derive the location
> of the GOT for itself, but there's a possible upside there that there's an
> extra free register (r9 can be used as a general purpose register...)

We try to minimize those in order to perform the easy relocation trick 
which requires no reference to global initialized data.  Hence this in 
the linker script:

  /DISCARD/ : {
    *(.ARM.exidx*)
    *(.ARM.extab*)
    /*
     * Discard any r/w data - this produces a link error if we have any,
     * which is required for PIC decompression.  Local data generates
     * GOTOFF relocations, which prevents it being relocated independently
     * of the text/got segments.
     */
    *(.data)
  }

> The patch would look like:
> -----8<-------
> diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> index 5cad8a6..afed28e 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -120,7 +120,7 @@ ORIG_CFLAGS := $(KBUILD_CFLAGS)
>  KBUILD_CFLAGS = $(subst -pg, , $(ORIG_CFLAGS))
>  endif
>  -ccflags-y := -fpic -fno-builtin -I$(obj)
> +ccflags-y := -fpic -mno-single-pic-base -fno-builtin -I$(obj)
>  asflags-y := -Wa,-march=all -DZIMAGE
>   # Supply kernel BSS size to the decompressor via a linker symbol.
> ------>8---------
> 
> 
> I did a fairly crude benchmark - count how many instructions we need in
> order to finish decompressing the kernel...
> 
> Setup r9 correctly:       129,976,282
> Use -mno-single-pic-base: 124,826,778
> 
> (this was done using an R-class model and a magic semi-hosting call to pause
> the model at the end of the decompress_kernel function)
> 
> So, it seems like the extra register means there's actually a 4% *win* 
> in instruction terms from using -mno-single-pic-base

Looks like you have a winner.

Acked-by: Nicolas Pitre <nico at linaro.org>

> That said, I've still made some comments/amendments below...
> 
> > 
> >>   arch/arm/boot/compressed/head.S |    4 ++++
> >>   1 file changed, 4 insertions(+)
> >>
> >> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> >> index fe4d9c3..4491e75 100644
> >> --- a/arch/arm/boot/compressed/head.S
> >> +++ b/arch/arm/boot/compressed/head.S
> >> @@ -410,6 +410,10 @@ wont_overwrite:
> >>    *   sp  = stack pointer
> >>    */
> >>   		orrs	r1, r0, r5
> >> +#ifdef __uClinux__
> >> +		mov	r9, r11			@ PIC offset base register
> >> +		addne	r9, r9, r0		@ Also needs relocating
> >> +#endif
> >>   		beq	not_relocated
> > 
> > Please don't insert your code between the orrs and the beq as those two
> > go logically together.
> 
> I'd initially done this in order to change only one site - as we need to
> set r9 and then add the offset I was using the condition code to test r0...
> 
> However, this was silly - I think I can just do it in one instruction:
> 
> add   r9, r11, r0
> 
> In the case that we're not relocated, r0 should be 0 anyway...
> 
> > 
> > In fact, the best location for this would probably be between the
> > wont_overwrite label and the comment that immediately follows it. And
> > then, those comments that follow until the branch into C code should be
> > updated accordingly.
> 
> 
> Okay, assuming I've understood you correctly, you're suggesting something
> like this:
> 
> -----8<-------
> 
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index fe4d9c3..d81efbd 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -396,6 +396,9 @@ dtb_check_done:
>                 mov     pc, r0
>   wont_overwrite:
> +#ifdef __uClinux__
> +               add     r9, r11, r0             @ uClinux PIC offset base register
> +#endif
>  /*
>   * If delta is zero, we are running at the address we were linked at.
>   *   r0  = delta
> @@ -405,6 +408,7 @@ wont_overwrite:
>   *   r5  = appended dtb size (0 if not present)
>   *   r7  = architecture ID
>   *   r8  = atags pointer
> + *   r9  = GOT start (for uClinux ABI), relocated
>   *   r11 = GOT start
>   *   r12 = GOT end
>   *   sp  = stack pointer
> @@ -470,6 +474,7 @@ not_relocated:      mov     r0, #0
>   *   r4  = kernel execution address
>   *   r7  = architecture ID
>   *   r8  = atags pointer
> + *   r9  = GOT start (for uClinux ABI)
>   */
>                 mov     r0, r4
>                 mov     r1, sp                  @ malloc space above stack
> ------->8-----------

Yes, that's what I was suggesting.

> The question that now occurs is whether we should just set r9 whether or not
> we're using a uClinux toolchain - I don't think it is going to hurt as the
> arm-linux-gnueabi world can happily clobber it with no bad consequences...
> 
> But after all this, it seems that just using -mno-single-pic base as in the patch
> above is best...

Indeed.  As longas this option is compatible with all toolchains.


Nicolas



More information about the linux-arm-kernel mailing list