[PATCH] Optimize multi-CPU tlb flushing a little more

Catalin Marinas catalin.marinas at arm.com
Tue Sep 6 12:53:38 EDT 2011


On 23 August 2011 12:06, Russell King - ARM Linux
<linux at arm.linux.org.uk> wrote:
> The compiler does not conditionalize the assembly instructions for
> the tlb operations, which leads to sub-optimal code being generated
> when building a kernel for multiple CPUs.
>
> We can tweak things fairly simply as the code fragment below shows:
>
>    17f8:       e3120001        tst     r2, #1  ; 0x1
> ...
>    1800:       0a000000        beq     1808 <handle_pte_fault+0x194>
>    1804:       ee061f10        mcr     15, 0, r1, cr6, cr0, {0}
>    1808:       e3120004        tst     r2, #4  ; 0x4
>    180c:       0a000000        beq     1814 <handle_pte_fault+0x1a0>
>    1810:       ee081f36        mcr     15, 0, r1, cr8, cr6, {1}
> becomes:
>    17f0:       e3120001        tst     r2, #1  ; 0x1
>    17f4:       1e063f10        mcrne   15, 0, r3, cr6, cr0, {0}
>    17f8:       e3120004        tst     r2, #4  ; 0x4
>    17fc:       1e083f36        mcrne   15, 0, r3, cr8, cr6, {1}

We need to be careful in this area if a conditional TLB operation is
not supported by the hardware. Conditional undefined instructions may
trigger an undef abort even if the condition fails (though I think
that's the case only on a Qualcomm implementation, but you never know
in the future).

IIUC, with your patch we could get some conditional inner shareable
TLB maintenance on a UP implementation which doesn't have such
operation.

-- 
Catalin



More information about the linux-arm-kernel mailing list