[PATCH -v2 2/2] arm64, tlbflush: don't TLBI broadcast if page reused in write fault

Huang, Ying ying.huang at linux.alibaba.com
Wed Oct 22 02:46:48 PDT 2025


Barry Song <21cnbao at gmail.com> writes:

>>
>> With PTL, this becomes
>>
>> CPU0:                           CPU1:
>>
>> page fault                      page fault
>> lock PTL
>> write PTE
>> do local tlbi
>> unlock PTL
>>                                 lock PTL        <- pte visible to CPU 1
>>                                 read PTE        <- new PTE
>>                                 do local tlbi   <- new PTE
>>                                 unlock PTL
>
> I agree. Yet the ish barrier can still avoid the page faults during CPU0's PTL.

IIUC, you think that dsb(ish) compared with dsb(nsh) can accelerate
memory writing (visible to other CPUs).  TBH, I suspect that this is the
case.

> CPU0:                                                                  CPU1:
>
> lock PTL
>
> write pte;
> Issue ish barrier
> do local tlbi;
>
>
>     No page fault occurs if tlb misses
>
>
> unlock PTL
>
>
> Otherwise, it could be:
>
>
> CPU0:                                                                  CPU1:
>
> lock PTL
>
> write pte;
> Issue nsh barrier
> do local tlbi;
>
>
>     page fault occurs if tlb misses
>
>
> unlock PTL
>
>
> Not quite sure if adding an ish right after the PTE modification has any
> noticeable performance impact on the test? I assume the most expensive part
> is still the tlbi broadcast dsb, not the PTE memory sync barrier?

---
Best Regards,
Huang, Ying



More information about the linux-arm-kernel mailing list