Excessive TLB flush ranges

Uladzislau Rezki urezki at gmail.com
Wed May 17 04:26:21 PDT 2023


On Tue, May 16, 2023 at 07:04:34PM +0200, Thomas Gleixner wrote:
> On Tue, May 16 2023 at 17:01, Uladzislau Rezki wrote:
> > On Tue, May 16, 2023 at 04:38:58PM +0200, Thomas Gleixner wrote:
> >> There is a world outside of x86, but even on x86 it's borderline silly
> >> to take the whole TLB out when you can flush 3 TLB entries one by one
> >> with exactly the same number of IPIs, i.e. _one_. No?
> >> 
> > I meant if we invoke flush_tlb_kernel_range() on each VA's individual
> > range:
> >
> > <ARM>
> > void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> > {
> > 	if (tlb_ops_need_broadcast()) {
> > 		struct tlb_args ta;
> > 		ta.ta_start = start;
> > 		ta.ta_end = end;
> > 		on_each_cpu(ipi_flush_tlb_kernel_range, &ta, 1);
> > 	} else
> > 		local_flush_tlb_kernel_range(start, end);
> > 	broadcast_tlb_a15_erratum();
> > }
> > <ARM>
> >
> > we should IPI and wait, no?
> 
> The else clause does not do an IPI, but that's irrelevant.
> 
> The proposed flush_tlb_kernel_vas(list, num_pages) mechanism
> achieves:
> 
>   1) It batches multiple ranges to _one_ invocation
> 
>   2) It lets the architecture decide based on the number of pages
>      whether it does a tlb_flush_all() or a flush of individual ranges.
> 
> Whether the architecture uses IPIs or flushes only locally and the
> hardware propagates that is completely irrelevant.
> 
> Right now any coalesced range, which is huge due to massive holes, takes
> decision #2 away.
> 
> If you want to flush individual VAs from the core vmalloc code then you
> lose #1, as the aggregated number of pages might justify a tlb_flush_all().
> 
> That's a pure architecture decision and all the core code needs to do is
> to provide appropriate information and not some completely bogus request
> to flush 17312759359 pages, i.e. a ~64.5 TB range, while in reality
> there are exactly _three_ distinct pages to flush.
> 
1.

I think, all two cases(logic) should be moved into ARCH code, so a decision 
is made _not_ by vmalloc code how to flush, either fully, if it supported or
page by page that require list chasing.

As for vmalloc interace, we can provide the list(we keep it short, because of
merging property) + number of pages to flush.

2.

It looks like your problem is because of 

void vfree(const void *addr)
{
...
	if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS))
		vm_reset_perms(vm); <----
...
}

so, all purged areas are drained in a caller context, so it is blocked
until the drain is done including flushing. I am not sure why it is done
from a caller context.

IMHO, it should be deferred same way as we do in:

static void free_vmap_area_noflush(struct vmap_area *va)

if do not miss the point why vfree() has to do it directly.

--
Uladzislau Rezki



More information about the linux-arm-kernel mailing list