[PATCH -v2 2/2] arm64, tlbflush: don't TLBI broadcast if page reused in write fault

Huang, Ying ying.huang at linux.alibaba.com
Wed Oct 22 02:02:00 PDT 2025


Barry Song <21cnbao at gmail.com> writes:

>> >
>> > static inline void __flush_tlb_page_nosync(struct mm_struct *mm,
>> >                                            unsigned long uaddr)
>> > {
>> >         unsigned long addr;
>> >
>> >         dsb(ishst);
>> >         addr = __TLBI_VADDR(uaddr, ASID(mm));
>> >         __tlbi(vale1is, addr);
>> >         __tlbi_user(vale1is, addr);
>> >         mmu_notifier_arch_invalidate_secondary_tlbs(mm, uaddr & PAGE_MASK,
>> >                                                 (uaddr & PAGE_MASK) +
>> > PAGE_SIZE);
>> > }
>>
>> IIUC, _nosync() here means doesn't synchronize with the following code.
>> It still synchronizes with the previous code, mainly the page table
>> changing.  And, Yes.  There may be room to improve this.
>>
>> > On the other hand, __ptep_set_access_flags() doesn’t seem to use
>> > set_ptes(), so there’s no guarantee the updated PTEs are visible to all
>> > cores. If a remote CPU later encounters a page fault and performs a TLB
>> > invalidation, will it still see a stable PTE?
>>
>> I don't think so.  We just flush local TLB in local_flush_tlb_page()
>> family functions.  So, we only needs to guarantee the page table changes
>> are available for the local page table walking.  If a page fault occurs
>> on a remote CPU, we will call local_flush_tlb_page() on the remote CPU.
>>
>
> My concern is that:
>
> We don’t have a dsb(ish) to ensure the PTE page table is visible to remote
> CPUs, since you’re using dsb(nsh). So even if a remote CPU performs
> local_flush_tlb_page(), the memory may not be synchronized yet, and it could
> still see the old PTE.

So, do you think that after the load/store unit of the remote CPU have
seen the new PTE, the page table walker could still see the old PTE?  I
doubt it.  Even if so, the worse case is one extra spurious page fault?
If the possibility of the worst case is low enough, that should be OK.

Additionally, the page table lock is held when writing PTE on this CPU
and re-reading PTE on the remote CPU.  That provides some memory order
guarantee too.

---
Best Regards,
Huang, Ying



More information about the linux-arm-kernel mailing list