[PATCH v2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu

Linu Cherian linu.cherian at arm.com
Mon Jun 15 21:54:44 PDT 2026


Hi,

On Mon, Jun 15, 2026 at 12:21:19PM +0100, Ryan Roberts wrote:
> On 14/06/2026 12:04, Will Deacon wrote:
> > On Sat, May 23, 2026 at 07:17:10PM +0530, Linu Cherian wrote:
> >> From: Ryan Roberts <ryan.roberts at arm.com>
> >>
> >> There are 3 variants of tlb flush that invalidate user mappings:
> >> flush_tlb_mm(), flush_tlb_page() and __flush_tlb_range(). All of these
> >> would previously unconditionally broadcast their tlbis to all cpus in
> >> the inner shareable domain.
> >>
> >> But this is a waste of effort if we can prove that the mm for which we
> >> are flushing the mappings has only ever been active on the local cpu. In
> >> that case, it is safe to avoid the broadcast and simply invalidate the
> >> current cpu.
> >>
> >> So let's track in mm_context_t::active_cpu either the mm has never been
> >> active on any cpu, has been active on more than 1 cpu, or has been
> >> active on precisely 1 cpu - and in that case, which one. We update this
> >> when switching context, being careful to ensure that it gets updated
> >> *before* installing the mm's pgtables. On the reader side, we ensure we
> >> read *after* the previous write(s) to the pgtable(s) that necessitated
> >> the tlb flush have completed. This guarrantees that if a cpu that is
> >> doing a tlb flush sees it's own id in active_cpu, then the old pgtable
> >> entry cannot have been seen by any other cpu and we can flush only the
> >> local cpu.
> >>
> >> Signed-off-by: Ryan Roberts <ryan.roberts at arm.com>
> >> Reviewed-by: Catalin Marinas <catalin.marinas at arm.com>
> >> Tested-by: Huang Ying <ying.huang at linux.alibaba.com>
> >> [linu.cherian at arm.com: Adapted for v7.1 flush tlb API changes]
> >> Signed-off-by: Linu Cherian <linu.cherian at arm.com>
> >> ---
> >> Changelog from RFC v1:
> >> - Adapted for v7.1 flush tlb API changes
> >>   No changes in core logic
> >> - Collected Rb and Tb tags
> >> - lat_mmap benchmark showed dsb(ishst) performs better than dsb(ish),
> >>   hence retained dsb(ishst) in flush_tlb_user_pre	
> >>
> >>
> >> Testing with 7.1-rc4 :
> >> +-----------------------+---------------------------------------------------+-------------+
> >> | Benchmark             | Result Class                                      |  Improvement|  
> >> +=======================+===================================================+=============+
> >> | perf/syscall          | fork (ops/sec)                                    |   (I) 3.25% |
> >> +-----------------------+---------------------------------------------------+-------------+
> >> | pts/memtier-benchmark | Protocol: Redis Clients: 100 Ratio: 1:5 (Ops/sec) |   (I) 2.70% |
> >> | 			| Protocol: Redis Clients: 100 Ratio: 5:1 (Ops/sec) |   (I) 2.13% |
> >> +-----------------------+---------------------------------------------------+-------------+
> > 
> > I think we need a much more comprehensive set of benchmarks before we can
> > begin to consider a change like this.
> 
> I believe that Linu ran a wider set of benchmarks and didn't find any
> regressions. These are just the ones that show improvement (Linu, please correct
> me and/or provide details).

Yes, thats correct.


--
Thanks,
Linu Cherian.



More information about the linux-arm-kernel mailing list