[PATCH] ARM: v6: avoid read_cpuid_ext() on ARM1136r0 in cache_ops_need_broadcast()

Russell King - ARM Linux linux at arm.linux.org.uk
Sun Jul 28 07:52:20 EDT 2013


On Sun, Jul 28, 2013 at 12:10:38PM +0100, Will Deacon wrote:
> Hi Paul,
> 
> Cheers for sticking with this!
> 
> On Sun, Jul 28, 2013 at 06:43:24AM +0100, Paul Walmsley wrote:
> > 
> > Commit 621a0147d5c921f4cc33636ccd0602ad5d7cbfbc ("ARM: 7757/1: mm:
> > don't flush icache in switch_mm with hardware broadcasting") breaks
> > the boot on OMAP2430SDP with omap2plus_defconfig.  Tracked to an
> > undefined instruction abort from the CP15 read in
> > cache_ops_need_broadcast().  It turns out that early ARM1136 variants
> > don't support several CP15 registers that later ARM cores do.
> > ARM1136JF-S TRM section 3.2.1 "Register allocation" has the details.
> 
> Intriguing... I wouldn't expect a cp15 read to be emitted for 1136, since
> the SMP_ON_UP magic should have caused is_smp() to return false.

The compiler seems to be doing something quite odd with schedule():

00000540 <__schedule>:
 558:   ee103ff1        mrc     15, 0, r3, cr0, cr1, {7}
 560:   e1a03623        lsr     r3, r3, #12
 564:   e203300f        and     r3, r3, #15
 570:   e50b3080        str     r3, [fp, #-128] ; 0x80
...
 6e8:   e59f3428        ldr     r3, [pc, #1064] ; b18 <__schedule+0x5d8>
 6f0:   e5933000        ldr     r3, [r3]
 6f4:   e3530000        cmp     r3, #0
 6f8:   1a000027        bne     79c <__schedule+0x25c>
 6fc:   e2851f65        add     r1, r5, #404    ; 0x194
 700:   ebfffffe        bl      0 <_test_and_set_bit>
                        700: R_ARM_CALL _test_and_set_bit
...
 79c:   e51b1080        ldr     r1, [fp, #-128] ; 0x80
 7a0:   e3510000        cmp     r1, #0
 7a4:   1affffd4        bne     6fc <__schedule+0x1bc>
                        b18: R_ARM_ABS32        smp_on_up

Yes - that's right, it's reading from the mcr at the very start of
__schedule and storing it on the stack for just one test later on.  The
act of storing it on the stack and reading it back of is likely a lot
more expensive than reading it from CP15 in the first place!

We don't want to make the asm() itself volatile, because that means the
asm() statement can't be optimised away.

Adding a memory clobber to that asm seems to place it more appropriately,
but again, that's not particularly desirable.

Another solution would be to make both tlb_ops_need_broadcast and
cache_ops_need_broadcast both be a flag test, but that introduces a
memory load in all those paths no matter if we're running on SMP_ON_UP
or not.



More information about the linux-arm-kernel mailing list