[External] Re: [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch

Xu Lu luxu.kernel at bytedance.com
Sun Nov 2 23:06:07 PST 2025


Hi Guo Ren,

On Mon, Nov 3, 2025 at 11:44 AM Guo Ren <guoren at kernel.org> wrote:
>
> On Thu, Oct 30, 2025 at 9:57 PM Xu Lu <luxu.kernel at bytedance.com> wrote:
> >
> > When need to flush tlb of a remote cpu, there is no need to send an IPI
> > if the target cpu is not using the asid we want to flush. Instead, we
> > can cache the tlb flush info in percpu buffer, and defer the tlb flush
> > to the next context_switch.
> >
> > This reduces the number of IPI due to tlb flush:
> >
> > * ltp - mmapstress01
> > Before: ~108k
> > After: ~46k
> Great result!
>
> I've some questions:
> 1. Do we need an accurate address flush by a new queue of
> flush_tlb_range_data? Why not flush the whole asid?

Flushing the whole address space may cause subsequent tlb misses.
Consider such a case: there is only one user mode thread frequently
running on the target hart. When the user thread falls asleep and cpu
context switches to idle thread, another thread of the same process
running on another hart modifies the mapping and needs to perform tlb
flush. The first user mode thread will encounter a large number of tlb
misses when it resumes. I want to try to balance the ipi count and tlb
misses.

> 2. If we reuse the context_tlb_flush_pending mechanism, could
> mmapstress01 gain the result better than ~46k?

Besides lazy tlb flush, another way to reduce ipi overhead is to clean
mm_cpumask. And it does gain a better result for mmapstress01. I have
sent a patch[1] which clears mm_cpumask whenever flushing all tlb of a
certain asid and it reduces the ipi count from ~98k to 268.

As was mentioned in the previous email, in the next version, I will
supply the mm_cpumask clear procedure. Specifically, I will flush all
tlb of an asid and clear mm_cpumask whenever it hasn't been scheduled
after enough context switches.

[1] https://lore.kernel.org/all/20250827131444.23893-3-luxu.kernel@bytedance.com/

> 3. If we meet the kernel address space, we must use IPI flush
> immediately, but I didn't see your patch consider that case, or am I
> wrong?

Nice catch! Forgot to add the kernel ASID judgment logic in the
shoulded_ipi_flush function. I will supply it in the next version.

I have considered canceling ipi and deferring the tlb flush to the
next time target hart enters the s mode if the target hart is now
running in user mode. But there are too many kernel entry points to
consider, especially now we have sse. For kernel tlb flush, it may be
more secure to send ipi. Thanks.

Best Regards,
Xu Lu

>
> >
> > Future plan in the next version:
> >
> > - This patch series reduces IPI by deferring tlb flush to
> > context_switch. It does not clear the mm_cpumask of target mm_struct. In
> > the next version, I will apply a threshold to the number of ASIDs
> > maintained by each cpu's tlb. Once the threshold is exceeded, ASID that
> > has not been used for the longest time will be flushed out. And current
> > cpu will be cleared in the mm_cpumask.
> >
> > Thanks in advance for your comments.
> >
> > Xu Lu (4):
> >   riscv: mm: Introduce percpu loaded_asid
> >   riscv: mm: Introduce percpu tlb flush queue
> >   riscv: mm: Enqueue tlbflush info if task is not running on target cpu
> >   riscv: mm: Perform tlb flush during context_switch
> >
> >  arch/riscv/include/asm/mmu_context.h |  1 +
> >  arch/riscv/include/asm/tlbflush.h    |  4 ++
> >  arch/riscv/mm/context.c              | 10 ++++
> >  arch/riscv/mm/tlbflush.c             | 76 +++++++++++++++++++++++++++-
> >  4 files changed, 90 insertions(+), 1 deletion(-)
> >
> > --
> > 2.20.1
> >
>
>
> --
> Best Regards
>  Guo Ren



More information about the linux-riscv mailing list