Code generation involving __raw_readl and __raw_writel
Willy Tarreau
w at 1wt.eu
Thu Nov 27 03:09:50 PST 2014
Hi,
On Thu, Nov 27, 2014 at 11:40:34AM +0100, Mason wrote:
> Hello everyone,
>
> Consider the following code (preprocessor output):
>
> static int tangox_target(struct cpufreq_policy *policy, unsigned int index)
> {
> while (__raw_readl((volatile void *)(0xf0000000 +(0x10024))) >> 31);
> __raw_writel(0, (volatile void *)(0xf0000000 +(0x10024)));
> return 0;
> }
>
> gcc generates the following code:
> (version and command-line in sig below)
>
> 00000014 <tangox_target>:
> 14: e3a03000 mov r3, #0
> 18: e34f3001 movt r3, #61441 ; 0xf001
> 1c: e3a02000 mov r2, #0
> 20: e34f2001 movt r2, #61441 ; 0xf001
> 24: e5931024 ldr r1, [r3, #36] ; 0x24
> 28: e3510000 cmp r1, #0
> 2c: bafffffa blt 1c <tangox_target+0x8>
> 30: e3a00000 mov r0, #0
> 34: e5820024 str r0, [r2, #36] ; 0x24
> 38: e12fff1e bx lr
>
> Do you know why gcc duplicates the address in r2 and r3?
> And keeps putting the address in r2 over and over in the loop?
I'm used to see crap like this all the time, which is why I
*always* look at the assembly code for any performance-sensible
section. In general, I try hard to help gcc do the right thing,
or at least make it harder for it to do the wrong thing. Yes
that's painful. But with a bit of training, you get automatisms
and don't think about it anymore.
Above it's very likely that if you compute your offset into a
variable and use this variable, it will magically work.
Hoping this helps,
Willy
More information about the linux-arm-kernel
mailing list