[PATCH] arm64: mm: check length in sync_icache_aliases for performance

Zhangshaokun zhangshaokun at hisilicon.com
Thu May 11 07:42:46 PDT 2017


Hi Mark

Thanks for your reply.

On 2017/5/11 17:16, Mark Rutland wrote:
> Hi,
> 
> On Thu, May 11, 2017 at 04:19:32PM +0800, Shaokun Zhang wrote:
>> sync_icache_aliases calls flush_icache_range if icache is non-aliasing
>> policy[see 0a28714 ("arm64: Use PoU cache instr for I/D coherency")].
>>   
>> If icache uses non-aliasing and page size is 64K, it will broadcast 1K
>> DVMs(IC IVAU) to other cpu cores per page. In multi-cores system, so many
>> DVMs would degenerate performance. Even if page size is 4K, 64 DVMs will
>> be broadcasted and executed.
> 
> Please note that this depends on the I-cache and D-cache line sizes,
> which are not necessarily 64 bytes.

Right. I am sorry that maybe i should explain I-cache line size is 64 bytes
in my case.

> 
> This is also dependent on system integration. DVMs are not an
> architectural concept, and the interconnect may optimize this (e.g. with
> snoop filters).

Hmm, SF is a good choice, However it may be not suitable for IC IVAU broadcast,
perhaps i am limited about this.

> 
>> This patch fixes this issue using invalidation icache all instread of by
>> VA when length is one or multiple PAGE_SIZE, especailly for
>> __sync_icache_dcache.
> 
> This means that we'll over-invalidate the I-caches all the time,
> potentially harming the performance of unrelated tasks. So this is not
> necessarily an improvement.

Agree its harm, therefore only under the condition that one or more pages
would be used IC IVAU, using invalidate the I-cache replaces it.

> 
> Do you have a particular workload which is affected by this?

I write self-modifying code that i want to simulate JVM, it uses mmap to
allocate large memory holding executing code. In the test procedure, i
found that __sync_icache_dcache would be called many times and lots of
DVMs occur. It is mainly used to handle page fault and memory migration.
When i add this check, it decreases number of DVMs. Because of much OOM
printing information, i couldn't give the result between the two scenes.
Maybe i need to optimize this test model.

Thanks
Shaokun

> 
> Thanks,
> Mark.
> 
>> Signed-off-by: Shaokun Zhang <zhangshaokun at hisilicon.com>
>> ---
>>  arch/arm64/mm/flush.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
>> index 21a8d82..f71da2d 100644
>> --- a/arch/arm64/mm/flush.c
>> +++ b/arch/arm64/mm/flush.c
>> @@ -29,7 +29,7 @@ void sync_icache_aliases(void *kaddr, unsigned long len)
>>  {
>>  	unsigned long addr = (unsigned long)kaddr;
>>  
>> -	if (icache_is_aliasing()) {
>> +	if ((len >= PAGE_SIZE) || icache_is_aliasing()) {
>>  		__clean_dcache_area_pou(kaddr, len);
>>  		__flush_icache_all();
>>  	} else {
>> -- 
>> 1.9.1
>>
> 
> .
> 




More information about the linux-arm-kernel mailing list