[PATCH] ARM: implement optimized percpu variable access

Will Deacon will.deacon at arm.com
Tue Nov 27 08:17:42 EST 2012


On Mon, Nov 26, 2012 at 05:30:55PM +0000, Rob Herring wrote:
> On 11/26/2012 09:15 AM, Will Deacon wrote:
> > The problem here is that, because our CPU accessors don't actually make any
> > memory references, the barrier() has no effect and the old value is just
> > reloaded off the stack:
> > 
> >   c02c1f22:       f54a fe49       bl      c000cbb8 <__switch_to>
> >   c02c1f26:       4601            mov     r1, r0
> >   c02c1f28:       68f8            ldr     r0, [r7, #12]
> >   c02c1f2a:       f56f ffd5       bl      c0031ed8 <finish_task_switch>
> > 
> > which obviously causes complete chaos if the new task has been pulled from
> > a different runqueue! (this appears as a double spin unlock on rq->lock).
> > 
> > Fixing this without giving up the performance improvement we gain by *avoiding*
> > the memory access in the first place is going to be tricky...
> 
> What compiler and config are you using? I get a reload of the register here:
> 
> c0350fba:       f001 fa9d       bl      c03524f8 <__switch_to>
> c0350fbe:       4601            mov     r1, r0
> c0350fc0:       ee1d 0f90       mrc     15, 0, r0, cr13, cr0, {4}
> c0350fc4:       4c2a            ldr     r4, [pc, #168]  ; (c0351070
> <__schedule+0x390>)
> c0350fc6:       f649 1590       movw    r5, #39312      ; 0x9990
> c0350fca:       1900            adds    r0, r0, r4
> c0350fcc:       f2cc 0556       movt    r5, #49238      ; 0xc056
> c0350fd0:       f4e7 fd56       bl      c0038a80 <finish_task_switch>

I tried both Linaro 12.07 and 12.10 GCC builds, although the problem would
only occur if I did a make clean and then a fresh build on top of that.
Just building the relavant object files didn't seem to tickle the problem.

I can mail you my .config if you like?

Will



More information about the linux-arm-kernel mailing list