[QUESTION FOR ARM64 TLB] performance issue and implementation difference of TLB flush

Gang Li ligang.bdlg at bytedance.com
Fri May 5 02:48:55 PDT 2023


This series accidentally lost CC. Now I forward the lost emails to the
mailing list.

On 2023/4/28 17:27, Mark Rutland wrote:
> 
> 
> Hi,
> 
> Just to check -- did you mean to drop the other Ccs? It would be good to keep
> this discussion on-list if possible.
> 
> On Fri, Apr 28, 2023 at 01:49:46PM +0800, Gang Li wrote:
>> On 2023/4/27 15:30, Mark Rutland wrote:
>>> On Thu, Apr 27, 2023 at 11:26:50AM +0800, Gang Li wrote:
>>>> 1. I am curious to know the reason behind the design choice of flushing
>>>> the TLB on all cores for ARM64's clear_fixmap, while AMD64 only flushes
>>>> the TLB on a single core. Are there any TLB design details that make a
>>>> difference here?
>>>
>>> I don't know why arm64 only clears this on a single CPU.
>>
>> Sorry, I'm a bit confused.
>>
>> Did you mean you don't know why *amd64* only clears this on a single
>> CPU?
> 
> Yes, sorry; I meant to say "amd64" rather than "arm64" here.
> 
>> Looks like I should ask amd64 guy 😉
> 
> 😉
> 
>>> On arm64 we *must* invalidate the TLB on all CPUs as the kernel page tables are
>>> shared by all CPUs, and the architectural Break-Before-Make rules in require
>>> the TLB to be invalidated between two valid (but distinct) entries.
>>
>> ghes_unmap is protected by a spin_lock, so only one core can access this
>> mem area at a time. I understand that there will be no TLB for
>> this memory area on other cores.
>>
>> Is it because arm64 has speculative execution? Even if the core does not
>> hold the spin_lock, the TLB will still cache the critical section?
> 
> The architecture allows a CPU to allocate TLB entries at any time for any
> reason, for any valid translation table entries reachable from the root in
> TTBR{0,1}_ELx. That can be due to speculation, prefetching, and/or other
> reasons.
> 
> Due to that, it doesn't matter whether or not a CPU explicitly accesses a
> memory location -- TLB entries can be allocated regardless. Consequently, the
> spinlock doesn't make any difference.
> 
> Thanks,
> Mark.
> 




More information about the linux-arm-kernel mailing list