[RFC PATCH] arm: decompressor: initialize PIC offset base register for uClinux tools
Jonathan Austin
jonathan.austin at arm.com
Fri Feb 1 11:43:31 EST 2013
Hi Nicolas, thanks for the comments,
On 29/01/13 20:13, Nicolas Pitre wrote:
> On Tue, 29 Jan 2013, Jonathan Austin wrote:
>
>> Before jumping to (position independent) C-code from the decompressor's
>> assembler world we set-up the C environment. This setup currently does not
>> set r9, which for arm-none-uclinux-uclibceabi should be the PIC offset base
>> register (IE should point to the beginning of the GOT).
>>
>> Currently, therefore, in order to build working kernels that use the
>> decompressor it is necessary to use an arm-linux-gnueabi toolchain, or
>> similar. uClinux toolchains cause a Prefetch Abort to occur at the beginning
>> of the decompress_kernel function.
>>
>> This patch allows uClinux toolchains to build bootable zImages by setting r9
>> to the beginning of the GOT when __uClinux__ is defined, allowing the
>> decompressor's C functions to work correctly.
>>
>> Signed-off-by: Jonathan Austin <jonathan.austin at arm.com>
>> ---
>>
>> One other possibility would be to specify -mno-single-pic-base when building
>> the decompressor. This works around the problem, but forces the compiler to
>> generate less optimal code.
>
> How "less optimal"? How much bigger/slower is it?
> If not significant enough then going with -mno-single-pic-base might be
> fine.
Code that needs to access anything global will need to derive the location
of the GOT for itself, but there's a possible upside there that there's an
extra free register (r9 can be used as a general purpose register...)
The patch would look like:
-----8<-------
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 5cad8a6..afed28e 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -120,7 +120,7 @@ ORIG_CFLAGS := $(KBUILD_CFLAGS)
KBUILD_CFLAGS = $(subst -pg, , $(ORIG_CFLAGS))
endif
-ccflags-y := -fpic -fno-builtin -I$(obj)
+ccflags-y := -fpic -mno-single-pic-base -fno-builtin -I$(obj)
asflags-y := -Wa,-march=all -DZIMAGE
# Supply kernel BSS size to the decompressor via a linker symbol.
------>8---------
I did a fairly crude benchmark - count how many instructions we need in
order to finish decompressing the kernel...
Setup r9 correctly: 129,976,282
Use -mno-single-pic-base: 124,826,778
(this was done using an R-class model and a magic semi-hosting call to pause
the model at the end of the decompress_kernel function)
So, it seems like the extra register means there's actually a 4% *win*
in instruction terms from using -mno-single-pic-base
That said, I've still made some comments/amendments below...
>
>> arch/arm/boot/compressed/head.S | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
>> index fe4d9c3..4491e75 100644
>> --- a/arch/arm/boot/compressed/head.S
>> +++ b/arch/arm/boot/compressed/head.S
>> @@ -410,6 +410,10 @@ wont_overwrite:
>> * sp = stack pointer
>> */
>> orrs r1, r0, r5
>> +#ifdef __uClinux__
>> + mov r9, r11 @ PIC offset base register
>> + addne r9, r9, r0 @ Also needs relocating
>> +#endif
>> beq not_relocated
>
> Please don't insert your code between the orrs and the beq as those two
> go logically together.
I'd initially done this in order to change only one site - as we need to
set r9 and then add the offset I was using the condition code to test r0...
However, this was silly - I think I can just do it in one instruction:
add r9, r11, r0
In the case that we're not relocated, r0 should be 0 anyway...
>
> In fact, the best location for this would probably be between the
> wont_overwrite label and the comment that immediately follows it. And
> then, those comments that follow until the branch into C code should be
> updated accordingly.
Okay, assuming I've understood you correctly, you're suggesting something
like this:
-----8<-------
diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index fe4d9c3..d81efbd 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -396,6 +396,9 @@ dtb_check_done:
mov pc, r0
wont_overwrite:
+#ifdef __uClinux__
+ add r9, r11, r0 @ uClinux PIC offset base register
+#endif
/*
* If delta is zero, we are running at the address we were linked at.
* r0 = delta
@@ -405,6 +408,7 @@ wont_overwrite:
* r5 = appended dtb size (0 if not present)
* r7 = architecture ID
* r8 = atags pointer
+ * r9 = GOT start (for uClinux ABI), relocated
* r11 = GOT start
* r12 = GOT end
* sp = stack pointer
@@ -470,6 +474,7 @@ not_relocated: mov r0, #0
* r4 = kernel execution address
* r7 = architecture ID
* r8 = atags pointer
+ * r9 = GOT start (for uClinux ABI)
*/
mov r0, r4
mov r1, sp @ malloc space above stack
------->8-----------
The question that now occurs is whether we should just set r9 whether or not
we're using a uClinux toolchain - I don't think it is going to hurt as the
arm-linux-gnueabi world can happily clobber it with no bad consequences...
But after all this, it seems that just using -mno-single-pic base as in the patch
above is best...
Thoughts?
Jonny
More information about the linux-arm-kernel
mailing list