[PATCH] ARM: pcpu: ensure __my_cpu_offset cannot be re-ordered across barrier()
Nicolas Pitre
nico at fluxnic.net
Tue Jun 4 13:39:42 EDT 2013
On Tue, 4 Jun 2013, Will Deacon wrote:
> On Tue, Jun 04, 2013 at 11:39:53AM +0100, Will Deacon wrote:
> > However, if you instead add a dependency on something that already lives in a
> > register, like __builtin_return_address, you get:
> >
> > c001cd44: e92d4010 push {r4, lr}
> > c001cd48: ee1dcf90 mrc 15, 0, ip, cr13, cr0, {4}
> > c001cd4c: e790410c ldr r4, [r0, ip, lsl #2]
> > c001cd50: ee1dcf90 mrc 15, 0, ip, cr13, cr0, {4}
> > c001cd54: e791110c ldr r1, [r1, ip, lsl #2]
> > c001cd58: e792210c ldr r2, [r2, ip, lsl #2]
> > c001cd5c: ee1d3f90 mrc 15, 0, r3, cr13, cr0, {4}
> > c001cd60: e7903103 ldr r3, [r0, r3, lsl #2]
> > c001cd64: e0840001 add r0, r4, r1
> > c001cd68: e0800002 add r0, r0, r2
> > c001cd6c: e0800003 add r0, r0, r3
> > c001cd70: e8bd8010 pop {r4, pc}
>
> Actually, I've managed to improve this further using sp instead (see diff
> below), which also prevents GCC from moving the lr into other registers for
> no reason. The code generated for your test is:
>
> c001cda4: ee1d3f90 mrc 15, 0, r3, cr13, cr0, {4}
> c001cda8: e790c103 ldr ip, [r0, r3, lsl #2]
> c001cdac: ee1d3f90 mrc 15, 0, r3, cr13, cr0, {4}
> c001cdb0: e7911103 ldr r1, [r1, r3, lsl #2]
> c001cdb4: e7922103 ldr r2, [r2, r3, lsl #2]
> c001cdb8: ee1d3f90 mrc 15, 0, r3, cr13, cr0, {4}
> c001cdbc: e7903103 ldr r3, [r0, r3, lsl #2]
> c001cdc0: e08c0001 add r0, ip, r1
> c001cdc4: e0800002 add r0, r0, r2
> c001cdc8: e0800003 add r0, r0, r3
> c001cdcc: e12fff1e bx lr
Excellent.
> I spoke to the tools guys and they'll look into why the original patch
> didn't work, since that could have wider implications on our use of
> barrier().
Good. However, given the wide range of gcc versions exhibiting the same
behavior, I think your patch is the best way forward even if gcc were to
be broken and eventually fixed.
Reviewed-by: Nicolas Pitre <nico at linaro.org>
I'd suggest queuing it for stable as well.
> --->8
>
> diff --git a/arch/arm/include/asm/percpu.h b/arch/arm/include/asm/percpu.h
> index 968c0a1..209e650 100644
> --- a/arch/arm/include/asm/percpu.h
> +++ b/arch/arm/include/asm/percpu.h
> @@ -30,8 +30,15 @@ static inline void set_my_cpu_offset(unsigned long off)
> static inline unsigned long __my_cpu_offset(void)
> {
> unsigned long off;
> - /* Read TPIDRPRW */
> - asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off) : : "memory");
> + register unsigned long *sp asm ("sp");
> +
> + /*
> + * Read TPIDRPRW.
> + * We want to allow caching the value, so avoid using volatile and
> + * instead use a fake stack read to hazard against barrier().
> + */
> + asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off) : "Q" (*sp));
> +
> return off;
> }
> #define __my_cpu_offset __my_cpu_offset()
>
More information about the linux-arm-kernel
mailing list