[PATCH] ARM: v6: prevent gcc from reordering extended CP15 reads above is_smp() test

Will Deacon will.deacon at arm.com
Mon Jul 29 06:02:43 EDT 2013


Hi Paul,

On Sun, Jul 28, 2013 at 09:16:29PM +0100, Paul Walmsley wrote:
> 
> Commit 621a0147d5c921f4cc33636ccd0602ad5d7cbfbc ("ARM: 7757/1: mm:
> don't flush icache in switch_mm with hardware broadcasting") breaks
> the boot on OMAP2430SDP with omap2plus_defconfig.  Tracked to an
> undefined instruction abort from the CP15 read in
> cache_ops_need_broadcast().  It turns out that gcc reorders the
> extended CP15 read above the is_smp() test.  This breaks ARM1136 r0
> cores, since they don't support several CP15 registers that later ARM
> cores do.  ARM1136JF-S TRM section 3.2.1 "Register allocation" has the
> details.

Cheers for tracking this down. Interestingly, I can't reproduce this with
anything other than GCC 4.5.* tools -- 4.6+ do what we want. Still, it looks
like a valid (if not misguided) thing to do.

> diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h
> index 8c25dc4..f428eb0 100644
> --- a/arch/arm/include/asm/cputype.h
> +++ b/arch/arm/include/asm/cputype.h
> @@ -89,13 +89,25 @@ extern unsigned int processor_id;
>  		__val;							\
>  	})
>  
> +
> +# if defined(CONFIG_CPU_V6)
> +/*
> + * The mrc in the read_cpuid_ext macro must not be reordered on ARMv6,
> + * else the compiler may move it before an is_smp() test, causing
> + * undefined instruction aborts on ARM1136 r0.
> + */
> +# define CPUID_EXT_REORDER	"cc", "memory"
> +# else
> +# define CPUID_EXT_REORDER	"cc"
> +# endif
> +
>  #define read_cpuid_ext(ext_reg)						\
>  	({								\
>  		unsigned int __val;					\
>  		asm("mrc	p15, 0, %0, c0, " ext_reg		\
>  		    : "=r" (__val)					\
>  		    :							\
> -		    : "cc");						\
> +		    : CPUID_EXT_REORDER);				\
>  		__val;							\
>  	})

I wouldn't worry about checking for CPU_V6. Besides, we probably need this
to be re-evaluated across barrier() when we get CPU migration on a
big-little platform anyway (we should probably also drop the
__attribute_const__ for that).

So you can just replace the "cc" (now that Nico kindly explained why those
aren't needed the other day) with "memory".

An alternative is to add barrier() between is_smp() and the read_cpuid_ext()
in all callers, adding a fake read from the stack to the latter (like I did
for the per-cpu accessor). However, this relies on fixing all callers for
very little gain, so I don't think it's worth the hassle.

I can cook a patch if you're tied up with other things -- just let me know.

Cheers,

Will



More information about the linux-arm-kernel mailing list