[RFT/RFC PATCH 3/6] ARM: add macro to perform far branches (b/bl)
Ard Biesheuvel
ard.biesheuvel at linaro.org
Thu Mar 12 14:37:56 PDT 2015
On 12 March 2015 at 22:15, Ard Biesheuvel <ard.biesheuvel at linaro.org> wrote:
> On 12 March 2015 at 22:03, Nicolas Pitre <nicolas.pitre at linaro.org> wrote:
>> On Thu, 12 Mar 2015, Ard Biesheuvel wrote:
>>
>>> On 12 March 2015 at 21:32, Nicolas Pitre <nicolas.pitre at linaro.org> wrote:
>>> > On Thu, 12 Mar 2015, Ard Biesheuvel wrote:
>>> >
>>> >> These macros execute PC-relative branches, but with a larger
>>> >> reach than the 24 bits that are available in the b and bl opcodes.
>>> >>
>>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
>>> >> ---
>>> >> arch/arm/include/asm/assembler.h | 29 +++++++++++++++++++++++++++++
>>> >> 1 file changed, 29 insertions(+)
>>> >>
>>> >> diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
>>> >> index f67fd3afebdf..bd08c3c1b73f 100644
>>> >> --- a/arch/arm/include/asm/assembler.h
>>> >> +++ b/arch/arm/include/asm/assembler.h
>>> >> @@ -108,6 +108,35 @@
>>> >> .endm
>>> >> #endif
>>> >>
>>> >> + /*
>>> >> + * Macros to emit relative branches that may exceed the range
>>> >> + * of the 24-bit immediate of the ordinary b/bl instructions.
>>> >> + * NOTE: this doesn't work with locally defined symbols, as they
>>> >> + * might lack the ARM/Thumb annotation (even if they are annotated
>>> >> + * as functions)
>>> >
>>> > I really hope you won't need a far call with local symbols ever!
>>> >
>>>
>>> Well, if you use pushsection/popsection, then local, numbered labels
>>> you refer to can be quite far away in the output image, and those will
>>> not have the thumb bit set.
>>
>> Indeed.
>>
>>> >> + */
>>> >> + .macro b_far, target, tmpreg
>>> >> +#if defined(CONFIG_CPU_32v7) || defined(CONFIG_CPU_32v7M)
>>> >> + ARM( movt \tmpreg, #:upper16:(\target - (8888f + 8)) )
>>> >> + ARM( movw \tmpreg, #:lower16:(\target - (8888f + 8)) )
>>> >> + THUMB( movt \tmpreg, #:upper16:(\target - (8888f + 4)) )
>>> >> + THUMB( movw \tmpreg, #:lower16:(\target - (8888f + 4)) )
>>> >> +8888: add pc, pc, \tmpreg
>>> >> +#else
>>> >> + ldr \tmpreg, 8889f
>>> >> +8888: add pc, pc, \tmpreg
>>> >> + .align 2
>>> >> +8889:
>>> >> + ARM( .word \target - (8888b + 8) )
>>> >
>>> > The Thumb relocation value is missing here.
>>> >
>>>
>>> Yes, this is bogus. But Thumb2 implies v7 or v7m, so it is not
>>> actually incorrect in this case.
>>
>> The ".align 2" would be redundant in that case too.
>>
>
> Correct, the #else bit is essentially ARM only
>
>>> But I will fix it in the next version
>>
>> Is it worth optimizing the ARM mode with movw/movt on ARMv7? If not
>> then this could be simplified as only:
>>
>> .macro b_far, target, tmpreg
>> THUMB( movt \tmpreg, #:upper16:(\target - (8888f + 4)) )
>> THUMB( movw \tmpreg, #:lower16:(\target - (8888f + 4)) )
>> ARM( ldr \tmpreg, 8888f+4 )
>> 8888: add pc, pc, \tmpreg
>> ARM( .word \target - (8888b + 8) )
>> .endm
>>
>
> movw/movt is preferred if available, since it circumvents the D-cache.
> And actually, I should rewrite the bl_far macro for v7 to use blx
> instead of adr+ldr to make better use of the return stack predictor or
> whatever it is called in the h/w
>
> And, as Russell points out, I should put a PC_BIAS #define somewhere
> that assumes the correct value for the used mode, instead of the +4/+8
> immediates.
>
> So I am thinking along the lines of
>
> .macro b_far, target, tmpreg
> #if defined(CONFIG_CPU_32v7) || defined(CONFIG_CPU_32v7M)
> movt \tmpreg, #:upper16:(\target - (8888f + PC_BIAS))
> movw \tmpreg, #:lower16:(\target - (8888f + PC_BIAS))
> 8888: add pc, pc, \tmpreg
> #else
> ldr \tmpreg, =\target - (8888f + PC_BIAS)
Replying to self: this doesn't work
/home/ard/linux-2.6/arch/arm/kernel/sleep.S: Assembler messages:
/home/ard/linux-2.6/arch/arm/kernel/sleep.S:131: Error: constant
expression expected -- `ldr ip,=__hyp_stub_install_secondary-8888f+4'
so the only way this is feasible is with an explicit literal, which
kind of sucks indeed for Dcache performance
Any other ideas?
> 8888: add pc, pc, \tmpreg
> #endif
> .endm
>
> .macro bl_far, target, tmpreg=ip
> #if defined(CONFIG_CPU_32v7) || defined(CONFIG_CPU_32v7M)
> movt \tmpreg, #:upper16:(\target - (8887f + PC_BIAS))
> movw \tmpreg, #:lower16:(\target - (8887f + PC_BIAS))
> 8887: add \tmpreg, \tmpreg, pc
> blx \tmpreg
> #else
> adr lr, BSYM(8887f)
> b_far \target, \tmpreg
> 8887:
> #endif
> .endm
More information about the linux-arm-kernel
mailing list