Unnecessary cache-line flush on page table updates ?
heechul Yun
heechul at illinois.edu
Mon Jul 4 17:24:35 EDT 2011
On Fri, Jul 1, 2011 at 2:42 PM, heechul Yun <heechul at illinois.edu> wrote:
> Great.
>
> Removing the PTE flush seems to have a noticeable performance
> difference in my test. The followings are lmbench 3.0a performance
> result measured on a Cortex A9 SMP platform. So far, I did not have
> any problem while doing various test.
>
> =========
> mm-patch:
> =========
> Pagefaults on /tmp/XXX: 3.0759 microseconds
> Process fork+exit: 464.5414 microseconds
> Process fork+execve: 785.4944 microseconds
> Process fork+/bin/sh -c: 488.6204 microseconds
I realized that I made a big mistake on the fork+/bin/sh data (other
numbers are fine). What happened was that there was no /bin/sh
(android platform has /system/bin/sh instead). When I did the first
test with the original kernel I made /bin as a symlink to /system/bin
which I forgot to do after flashing the system with the new, patched,
kernel.
The following is a result of the patched kernel after the correct symlink
Pagefaults on /tmp/XXX: 3.0991 microseconds
Process fork+exit: 465.1620 microseconds
Process fork+execve: 781.5077 microseconds
Process fork+/bin/sh -c: 2804.2023 microseconds
It's still noticeably better (about 5~15%) but not substantial.
Sorry for the confusion.
Heechul
>
> =========
> original:
> =========
> Pagefaults on /tmp/XXX: 3.6209 microseconds
> Process fork+exit: 485.5236 microseconds
> Process fork+execve: 820.0613 microseconds
> Process fork+/bin/sh -c: 2966.3828 microseconds
>
> Heechul
>
> On Fri, Jul 1, 2011 at 3:10 AM, Catalin Marinas <catalin.marinas at arm.com> wrote:
>> On Fri, Jul 01, 2011 at 08:04:42AM +0100, heechul Yun wrote:
>>> Based on TRM of Cortex A9, the MMU reads page table entries from L1-D
>>> cache not from memory. Then I think we do not need to flush the cache
>>> line in the following code because MMU will always see up-to-date view
>>> of page table in both UP and SMP systems.
>>>
>>> linux/arch/arm/mm/proc-v7.S
>>>
>>> ENTRY(cpu_v7_set_pte_ext)
>>> ...
>>> mcr p15, 0, r0, c7, c10, 1 @ flush_pte from
>>> D-cache // why we need this in A9?
>>> …
>>>
>>> If this is a necessary one, could you please explain the reason? Thanks.
>>
>> No, it's not necessary, only that this file is used by other processors
>> as well. The solution below checks the ID_MMFR3[23:20] bits (coherent
>> walk) and avoid flushing if the value is 1. The same could be done for
>> PMD entries, though that's less critical than the PTEs.
>>
>> Please note that the patch is not fully tested.
>>
>> 8<--------------------
>>
>> From 67bd5ebdf622637f8293286146441e6292713c3d Mon Sep 17 00:00:00 2001
>> From: Catalin Marinas <catalin.marinas at arm.com>
>> Date: Fri, 1 Jul 2011 10:57:07 +0100
>> Subject: [PATCH] ARMv7: Do not clean the PTE coherent page table walk is supported
>>
>> This patch adds a check for the ID_MMFR3[23:20] bits (coherent walk) and
>> only cleans the D-cache corresponding to a PTE if coherent page table
>> walks are not supported.
>>
>> Signed-off-by: Catalin Marinas <catalin.marinas at arm.com>
>> ---
>> arch/arm/mm/proc-v7.S | 4 +++-
>> 1 files changed, 3 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
>> index 8013afc..fc5b36f 100644
>> --- a/arch/arm/mm/proc-v7.S
>> +++ b/arch/arm/mm/proc-v7.S
>> @@ -166,7 +166,9 @@ ENTRY(cpu_v7_set_pte_ext)
>> ARM( str r3, [r0, #2048]! )
>> THUMB( add r0, r0, #2048 )
>> THUMB( str r3, [r0] )
>> - mcr p15, 0, r0, c7, c10, 1 @ flush_pte
>> + mrc p15, 0, r3, c0, c1, 7 @ read ID_MMFR3
>> + tst r3, #0xf << 20 @ check the coherent walk bits
>> + mcreq p15, 0, r0, c7, c10, 1 @ flush_pte
>> #endif
>> mov pc, lr
>> ENDPROC(cpu_v7_set_pte_ext)
>>
>> --
>> Catalin
>>
>
More information about the linux-arm-kernel
mailing list