[RFC] ARM: lib: delay-loop: Add align directive to fix BogoMIPS calculation

Fabio Estevam festevam at gmail.com
Fri Nov 29 06:02:05 EST 2013


Hi Russell,

On Fri, Nov 22, 2013 at 9:53 AM, Fabio Estevam <festevam at gmail.com> wrote:
> From: Fabio Estevam <fabio.estevam at freescale.com>
>
> Currently mx53 (CortexA8) running at 1GHz reports:
> Calibrating delay loop... 663.55 BogoMIPS (lpj=3317760)
>
> Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of
> __loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and
> 0xc take 3 clocks to run the loop twice. (1.5 clock/loop)
>
> The original object code looks like this:
>
> 00000010 <__loop_const_udelay>:
>   10:   e3e01000        mvn     r1, #0
>   14:   e51f201c        ldr     r2, [pc, #-28]  ; 0 <__loop_udelay-0x8>
>   18:   e5922000        ldr     r2, [r2]
>   1c:   e0800921        add     r0, r0, r1, lsr #18
>   20:   e1a00720        lsr     r0, r0, #14
>   24:   e0822b21        add     r2, r2, r1, lsr #22
>   28:   e1a02522        lsr     r2, r2, #10
>   2c:   e0000092        mul     r0, r2, r0
>   30:   e0800d21        add     r0, r0, r1, lsr #26
>   34:   e1b00320        lsrs    r0, r0, #6
>   38:   01a0f00e        moveq   pc, lr
>
> 0000003c <__loop_delay>:
>   3c:   e2500001        subs    r0, r0, #1
>   40:   8afffffe        bhi     3c <__loop_delay>
>   44:   e1a0f00e        mov     pc, lr
>
> After adding the 'align 3' directive to __loop_delay (align to 8 bytes):
>
> 00000010 <__loop_const_udelay>:
>   10:   e3e01000        mvn     r1, #0
>   14:   e51f201c        ldr     r2, [pc, #-28]  ; 0 <__loop_udelay-0x8>
>   18:   e5922000        ldr     r2, [r2]
>   1c:   e0800921        add     r0, r0, r1, lsr #18
>   20:   e1a00720        lsr     r0, r0, #14
>   24:   e0822b21        add     r2, r2, r1, lsr #22
>   28:   e1a02522        lsr     r2, r2, #10
>   2c:   e0000092        mul     r0, r2, r0
>   30:   e0800d21        add     r0, r0, r1, lsr #26
>   34:   e1b00320        lsrs    r0, r0, #6
>   38:   01a0f00e        moveq   pc, lr
>   3c:   e320f000        nop     {0}
>
> 00000040 <__loop_delay>:
>   40:   e2500001        subs    r0, r0, #1
>   44:   8afffffe        bhi     40 <__loop_delay>
>   48:   e1a0f00e        mov     pc, lr
>   4c:   e320f000        nop     {0}
>
> , which now reports:
> Calibrating delay loop... 996.14 BogoMIPS (lpj=4980736)
>
> Some more test results:
>
> On mx31 (ARM1136) running at 532 MHz, before the patch:
> Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184)
>
> On mx31 (ARM1136) running at 532 MHz after the patch:
> Calibrating delay loop... 528.79 BogoMIPS (lpj=2643968)
>
> Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same
> BogoMIPS value before and after this patch.
>
> Reported-by: Tom Evans <tom_usenet at optusnet.com.au>
> Suggested-by: Tom Evans <tom_usenet at optusnet.com.au>
> Signed-off-by: Fabio Estevam <fabio.estevam at freescale.com>

Any comments on this, please?

Regards,

Fabio Estevam



More information about the linux-arm-kernel mailing list