[RFC PATCH] arm: decompressor: initialize PIC offset base register for uClinux tools
Nicolas Pitre
nico at fluxnic.net
Fri Feb 1 13:07:51 EST 2013
On Fri, 1 Feb 2013, Jonathan Austin wrote:
> Hi Nicolas, thanks for the comments,
>
> On 29/01/13 20:13, Nicolas Pitre wrote:
> > On Tue, 29 Jan 2013, Jonathan Austin wrote:
> >
> >> Before jumping to (position independent) C-code from the decompressor's
> >> assembler world we set-up the C environment. This setup currently does not
> >> set r9, which for arm-none-uclinux-uclibceabi should be the PIC offset base
> >> register (IE should point to the beginning of the GOT).
> >>
> >> Currently, therefore, in order to build working kernels that use the
> >> decompressor it is necessary to use an arm-linux-gnueabi toolchain, or
> >> similar. uClinux toolchains cause a Prefetch Abort to occur at the beginning
> >> of the decompress_kernel function.
> >>
> >> This patch allows uClinux toolchains to build bootable zImages by setting r9
> >> to the beginning of the GOT when __uClinux__ is defined, allowing the
> >> decompressor's C functions to work correctly.
> >>
> >> Signed-off-by: Jonathan Austin <jonathan.austin at arm.com>
> >> ---
> >>
> >> One other possibility would be to specify -mno-single-pic-base when building
> >> the decompressor. This works around the problem, but forces the compiler to
> >> generate less optimal code.
> >
> > How "less optimal"? How much bigger/slower is it?
> > If not significant enough then going with -mno-single-pic-base might be
> > fine.
>
> Code that needs to access anything global will need to derive the location
> of the GOT for itself, but there's a possible upside there that there's an
> extra free register (r9 can be used as a general purpose register...)
We try to minimize those in order to perform the easy relocation trick
which requires no reference to global initialized data. Hence this in
the linker script:
/DISCARD/ : {
*(.ARM.exidx*)
*(.ARM.extab*)
/*
* Discard any r/w data - this produces a link error if we have any,
* which is required for PIC decompression. Local data generates
* GOTOFF relocations, which prevents it being relocated independently
* of the text/got segments.
*/
*(.data)
}
> The patch would look like:
> -----8<-------
> diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> index 5cad8a6..afed28e 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -120,7 +120,7 @@ ORIG_CFLAGS := $(KBUILD_CFLAGS)
> KBUILD_CFLAGS = $(subst -pg, , $(ORIG_CFLAGS))
> endif
> -ccflags-y := -fpic -fno-builtin -I$(obj)
> +ccflags-y := -fpic -mno-single-pic-base -fno-builtin -I$(obj)
> asflags-y := -Wa,-march=all -DZIMAGE
> # Supply kernel BSS size to the decompressor via a linker symbol.
> ------>8---------
>
>
> I did a fairly crude benchmark - count how many instructions we need in
> order to finish decompressing the kernel...
>
> Setup r9 correctly: 129,976,282
> Use -mno-single-pic-base: 124,826,778
>
> (this was done using an R-class model and a magic semi-hosting call to pause
> the model at the end of the decompress_kernel function)
>
> So, it seems like the extra register means there's actually a 4% *win*
> in instruction terms from using -mno-single-pic-base
Looks like you have a winner.
Acked-by: Nicolas Pitre <nico at linaro.org>
> That said, I've still made some comments/amendments below...
>
> >
> >> arch/arm/boot/compressed/head.S | 4 ++++
> >> 1 file changed, 4 insertions(+)
> >>
> >> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> >> index fe4d9c3..4491e75 100644
> >> --- a/arch/arm/boot/compressed/head.S
> >> +++ b/arch/arm/boot/compressed/head.S
> >> @@ -410,6 +410,10 @@ wont_overwrite:
> >> * sp = stack pointer
> >> */
> >> orrs r1, r0, r5
> >> +#ifdef __uClinux__
> >> + mov r9, r11 @ PIC offset base register
> >> + addne r9, r9, r0 @ Also needs relocating
> >> +#endif
> >> beq not_relocated
> >
> > Please don't insert your code between the orrs and the beq as those two
> > go logically together.
>
> I'd initially done this in order to change only one site - as we need to
> set r9 and then add the offset I was using the condition code to test r0...
>
> However, this was silly - I think I can just do it in one instruction:
>
> add r9, r11, r0
>
> In the case that we're not relocated, r0 should be 0 anyway...
>
> >
> > In fact, the best location for this would probably be between the
> > wont_overwrite label and the comment that immediately follows it. And
> > then, those comments that follow until the branch into C code should be
> > updated accordingly.
>
>
> Okay, assuming I've understood you correctly, you're suggesting something
> like this:
>
> -----8<-------
>
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index fe4d9c3..d81efbd 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -396,6 +396,9 @@ dtb_check_done:
> mov pc, r0
> wont_overwrite:
> +#ifdef __uClinux__
> + add r9, r11, r0 @ uClinux PIC offset base register
> +#endif
> /*
> * If delta is zero, we are running at the address we were linked at.
> * r0 = delta
> @@ -405,6 +408,7 @@ wont_overwrite:
> * r5 = appended dtb size (0 if not present)
> * r7 = architecture ID
> * r8 = atags pointer
> + * r9 = GOT start (for uClinux ABI), relocated
> * r11 = GOT start
> * r12 = GOT end
> * sp = stack pointer
> @@ -470,6 +474,7 @@ not_relocated: mov r0, #0
> * r4 = kernel execution address
> * r7 = architecture ID
> * r8 = atags pointer
> + * r9 = GOT start (for uClinux ABI)
> */
> mov r0, r4
> mov r1, sp @ malloc space above stack
> ------->8-----------
Yes, that's what I was suggesting.
> The question that now occurs is whether we should just set r9 whether or not
> we're using a uClinux toolchain - I don't think it is going to hurt as the
> arm-linux-gnueabi world can happily clobber it with no bad consequences...
>
> But after all this, it seems that just using -mno-single-pic base as in the patch
> above is best...
Indeed. As longas this option is compatible with all toolchains.
Nicolas
More information about the linux-arm-kernel
mailing list