[QUESTION FOR ARM64 TLB] performance issue and implementation difference of TLB flush
Gang Li
ligang.bdlg at bytedance.com
Fri May 5 05:28:35 PDT 2023
Hi,
I found that in `ghes_unmap` protected by spinlock, arm64 and x86 have
different strategies for flushing tlb.
# arm64 call trace:
```
holding a spin lock
ghes_unmap
clear_fixmap
__set_fixmap
flush_tlb_kernel_range
```
# x86 call trace:
```
holding a spin lock
ghes_unmap
clear_fixmap
__set_fixmap
mmu.set_fixmap
native_set_fixmap
__native_set_fixmap
set_pte_vaddr
set_pte_vaddr_p4d
__set_pte_vaddr
flush_tlb_one_kernel
```
As we can see, ghes_unmap in arm64 eventually calls
flush_tlb_kernel_range to broadcast TLB invalidation. However, on
x86, ghes_unmap calls flush_tlb_one_kernel.
Why arm64 needs to broadcast TLB invalidation in ghes_unmap, while only
one CPU has accessed this memory area?
Mark Rutland said in
https://lore.kernel.org/lkml/369d1be2-d418-1bfb-bfc2-b25e4e542d76@bytedance.com/
> The architecture (arm64) allows a CPU to allocate TLB entries at any time for any
> reason, for any valid translation table entries reachable from the
> root in
> TTBR{0,1}_ELx. That can be due to speculation, prefetching, and/or other
> reasons.
>
> Due to that, it doesn't matter whether or not a CPU explicitly accesses a
> memory location -- TLB entries can be allocated regardless.
> Consequently, the
> spinlock doesn't make any difference.
>
arm64 broadcast TLB invalidation in ghes_unmap, because TLB entry can be
allocated regardless of whether the CPU explicitly accesses memory.
Why doesn't x86 broadcast TLB invalidation in ghes_unmap? Is there any
difference between x86 and arm64 in TLB allocation and invalidation
strategy?
Thanks,
Gang Li
More information about the linux-arm-kernel
mailing list