Excessive TLB flush ranges

Baoquan He bhe at redhat.com
Tue May 16 02:48:18 PDT 2023


On 05/16/23 at 10:54am, Thomas Gleixner wrote:
> On Tue, May 16 2023 at 16:07, Baoquan He wrote:
> > On 05/16/23 at 08:40am, Thomas Gleixner wrote:
> >> On Tue, May 16 2023 at 10:26, Baoquan He wrote:
> >> > On 05/15/23 at 08:17pm, Uladzislau Rezki wrote:
> >> >> For systems which lack a full TLB flush and to flush a long range is
> >> >> a problem(it takes time), probably we can flush VA one by one. Because
> >> >> currently we calculate a flush range [min:max] and that range includes
> >> >> the space that might not be mapped at all. Like below:
> >> >
> >> > It's fine if we only calculate a flush range of [min:max] with VA. In
> >> > vm_reset_perms(), it calculates the flush range with the impacted direct
> >> > mapping range, then merge it with VA's range. That looks really strange
> >> > and surprising. If the vm->pages[] are got from a lower part of physical
> >> > memory, the final merged flush will span tremendous range. Wondering why
> >> > we need merge the direct map range with VA range, then do flush. Not
> >> > sure if I misunderstand it.
> >> 
> >> So what happens on this BPF teardown is:
> >> 
> >> The vfree(8k) ends up flushing 3 entries. The actual vmalloc part (2) and
> >> one extra which is in the direct map. I haven't verified that yet, but I
> >> assume it's the alias of one of the vmalloc'ed pages.
> >
> > It looks like the reason. As Uladzislau pointed out, ARCH-es may
> > have full TLB flush, so won't get trouble from the merged flush
> > in the calculated [min:max] way, e.g arm64 and x86's flush_tlb_kernel_range().
> > However, arm32 seems lacking the ability of full TLB flash.
> 
> ARM has a full flush, but it does not check for that in
> flush_tlb_kernel_range().
> 
> > If agreed, I can make a draft patch to do the flush for direct map and
> > VA seperately, see if it works.
> 
> Of course it works. Already done that.
> 
> But you are missing the point. Look at the examples I provided.
> 
> The current implementation ends up doing a full flush on x86 just to
> flush 3 TLB entries. For the very same reason because the flush range
> (start..end) becomes insanely large due to the direct map and vmalloc
> parts.
> 
> But doing indivudual flushes for direct map and vmalloc space is silly
> too because then it ends up doing two IPIs instead of one. IPIs are
> expensive and the whole point of coalescing the flushes is to spare
> IPIs, no?
> 
> So with my hacked up flush_tlb_kernel_vas() I end up having exactly
> _one_ IPI which walks the list and flushes the 3 TLB entries.

Makes sense, thanks for telling. While your handling about alias_va may
not be right. I will add inline comment in your patch, please check
there. 




More information about the linux-arm-kernel mailing list