[PATCH] ARM: implement optimized percpu variable access
Jamie Lokier
jamie at shareable.org
Mon Nov 26 16:58:26 EST 2012
Will Deacon wrote:
> That was a fun bit of debugging -- my hunch was right, but I was looking in the
> wrong place because I had an unrelated problem with my bootloader.
>
> What happens is that every man and his dog is inlined into __schedule,
> including all the runqueue accessors, such as this_rq(), which make use of
> per-cpu offsets to get the correct pointer. The compiler then spits out
> something like this near the start of the function:
>
[...]
>
> so the address of the current runqueue has been calculated and stored, with
> a bunch of other stuff, in a structure on the stack.
>
[...]
>
> barrier();
> /*
> * this_rq must be evaluated again because prev may have moved
> * CPUs since it called schedule(), thus the 'rq' on its stack
> * frame will be invalid.
> */
> finish_task_switch(this_rq(), prev);
>
> The problem here is that, because our CPU accessors don't actually make any
> memory references, the barrier() has no effect and the old value is just
> reloaded off the stack:
>
[...]
>
> which obviously causes complete chaos if the new task has been pulled from
> a different runqueue! (this appears as a double spin unlock on rq->lock).
>
> Fixing this without giving up the performance improvement we gain by *avoiding*
> the memory access in the first place is going to be tricky...
Perhaps look at x86's approach, in arch/x86/include, <asm/current.h>,
<asm/percpu.h> and <asm/switch_to.h>.
current uses this_cpu_read_stable():
/*
* this_cpu_read() makes gcc load the percpu variable every time it is
* accessed while this_cpu_read_stable() allows the value to be cached.
* this_cpu_read_stable() is more efficient and can be used if its value
* is guaranteed to be valid across cpus. The current users include
* get_current() and get_thread_info() both of which are actually
* per-thread variables implemented as per-cpu variables and thus
* stable for the duration of the respective task.
*/
So how does x86 do it?
After lots of staring at the x86 headers, I think
this_read_cpu_stable() is a non-volatile asm which depends only on the
address ¤t_task, which is constant (GCC doesn't know the address
is dereferenced), and I can't see how reloading the value after
switch_to() (32 or 64 bit versions) is guaranteed, unless the "p" asm
is constraint implies something more than "the value is a pointer".
I assume I've missed something very subtle, but if not x86 has the
same bug/issue as discussed in this thread.
-- Jamie
More information about the linux-arm-kernel
mailing list