Unnecessary cache-line flush on page table updates ?
heechul Yun
heechul at illinois.edu
Fri Jul 1 17:42:00 EDT 2011
Great.
Removing the PTE flush seems to have a noticeable performance
difference in my test. The followings are lmbench 3.0a performance
result measured on a Cortex A9 SMP platform. So far, I did not have
any problem while doing various test.
=========
mm-patch:
=========
Pagefaults on /tmp/XXX: 3.0759 microseconds
Process fork+exit: 464.5414 microseconds
Process fork+execve: 785.4944 microseconds
Process fork+/bin/sh -c: 488.6204 microseconds
=========
original:
=========
Pagefaults on /tmp/XXX: 3.6209 microseconds
Process fork+exit: 485.5236 microseconds
Process fork+execve: 820.0613 microseconds
Process fork+/bin/sh -c: 2966.3828 microseconds
Heechul
On Fri, Jul 1, 2011 at 3:10 AM, Catalin Marinas <catalin.marinas at arm.com> wrote:
> On Fri, Jul 01, 2011 at 08:04:42AM +0100, heechul Yun wrote:
>> Based on TRM of Cortex A9, the MMU reads page table entries from L1-D
>> cache not from memory. Then I think we do not need to flush the cache
>> line in the following code because MMU will always see up-to-date view
>> of page table in both UP and SMP systems.
>>
>> linux/arch/arm/mm/proc-v7.S
>>
>> ENTRY(cpu_v7_set_pte_ext)
>> ...
>> mcr p15, 0, r0, c7, c10, 1 @ flush_pte from
>> D-cache // why we need this in A9?
>> …
>>
>> If this is a necessary one, could you please explain the reason? Thanks.
>
> No, it's not necessary, only that this file is used by other processors
> as well. The solution below checks the ID_MMFR3[23:20] bits (coherent
> walk) and avoid flushing if the value is 1. The same could be done for
> PMD entries, though that's less critical than the PTEs.
>
> Please note that the patch is not fully tested.
>
> 8<--------------------
>
> From 67bd5ebdf622637f8293286146441e6292713c3d Mon Sep 17 00:00:00 2001
> From: Catalin Marinas <catalin.marinas at arm.com>
> Date: Fri, 1 Jul 2011 10:57:07 +0100
> Subject: [PATCH] ARMv7: Do not clean the PTE coherent page table walk is supported
>
> This patch adds a check for the ID_MMFR3[23:20] bits (coherent walk) and
> only cleans the D-cache corresponding to a PTE if coherent page table
> walks are not supported.
>
> Signed-off-by: Catalin Marinas <catalin.marinas at arm.com>
> ---
> arch/arm/mm/proc-v7.S | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
> index 8013afc..fc5b36f 100644
> --- a/arch/arm/mm/proc-v7.S
> +++ b/arch/arm/mm/proc-v7.S
> @@ -166,7 +166,9 @@ ENTRY(cpu_v7_set_pte_ext)
> ARM( str r3, [r0, #2048]! )
> THUMB( add r0, r0, #2048 )
> THUMB( str r3, [r0] )
> - mcr p15, 0, r0, c7, c10, 1 @ flush_pte
> + mrc p15, 0, r3, c0, c1, 7 @ read ID_MMFR3
> + tst r3, #0xf << 20 @ check the coherent walk bits
> + mcreq p15, 0, r0, c7, c10, 1 @ flush_pte
> #endif
> mov pc, lr
> ENDPROC(cpu_v7_set_pte_ext)
>
> --
> Catalin
>
More information about the linux-arm-kernel
mailing list