[PATCH v2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu
Linu Cherian
linu.cherian at arm.com
Mon Jun 15 21:54:44 PDT 2026
Hi,
On Mon, Jun 15, 2026 at 12:21:19PM +0100, Ryan Roberts wrote:
> On 14/06/2026 12:04, Will Deacon wrote:
> > On Sat, May 23, 2026 at 07:17:10PM +0530, Linu Cherian wrote:
> >> From: Ryan Roberts <ryan.roberts at arm.com>
> >>
> >> There are 3 variants of tlb flush that invalidate user mappings:
> >> flush_tlb_mm(), flush_tlb_page() and __flush_tlb_range(). All of these
> >> would previously unconditionally broadcast their tlbis to all cpus in
> >> the inner shareable domain.
> >>
> >> But this is a waste of effort if we can prove that the mm for which we
> >> are flushing the mappings has only ever been active on the local cpu. In
> >> that case, it is safe to avoid the broadcast and simply invalidate the
> >> current cpu.
> >>
> >> So let's track in mm_context_t::active_cpu either the mm has never been
> >> active on any cpu, has been active on more than 1 cpu, or has been
> >> active on precisely 1 cpu - and in that case, which one. We update this
> >> when switching context, being careful to ensure that it gets updated
> >> *before* installing the mm's pgtables. On the reader side, we ensure we
> >> read *after* the previous write(s) to the pgtable(s) that necessitated
> >> the tlb flush have completed. This guarrantees that if a cpu that is
> >> doing a tlb flush sees it's own id in active_cpu, then the old pgtable
> >> entry cannot have been seen by any other cpu and we can flush only the
> >> local cpu.
> >>
> >> Signed-off-by: Ryan Roberts <ryan.roberts at arm.com>
> >> Reviewed-by: Catalin Marinas <catalin.marinas at arm.com>
> >> Tested-by: Huang Ying <ying.huang at linux.alibaba.com>
> >> [linu.cherian at arm.com: Adapted for v7.1 flush tlb API changes]
> >> Signed-off-by: Linu Cherian <linu.cherian at arm.com>
> >> ---
> >> Changelog from RFC v1:
> >> - Adapted for v7.1 flush tlb API changes
> >> No changes in core logic
> >> - Collected Rb and Tb tags
> >> - lat_mmap benchmark showed dsb(ishst) performs better than dsb(ish),
> >> hence retained dsb(ishst) in flush_tlb_user_pre
> >>
> >>
> >> Testing with 7.1-rc4 :
> >> +-----------------------+---------------------------------------------------+-------------+
> >> | Benchmark | Result Class | Improvement|
> >> +=======================+===================================================+=============+
> >> | perf/syscall | fork (ops/sec) | (I) 3.25% |
> >> +-----------------------+---------------------------------------------------+-------------+
> >> | pts/memtier-benchmark | Protocol: Redis Clients: 100 Ratio: 1:5 (Ops/sec) | (I) 2.70% |
> >> | | Protocol: Redis Clients: 100 Ratio: 5:1 (Ops/sec) | (I) 2.13% |
> >> +-----------------------+---------------------------------------------------+-------------+
> >
> > I think we need a much more comprehensive set of benchmarks before we can
> > begin to consider a change like this.
>
> I believe that Linu ran a wider set of benchmarks and didn't find any
> regressions. These are just the ones that show improvement (Linu, please correct
> me and/or provide details).
Yes, thats correct.
--
Thanks,
Linu Cherian.
More information about the linux-arm-kernel
mailing list