Excessive TLB flush ranges

Tue May 16 10:04:34 PDT 2023

On Tue, May 16 2023 at 17:01, Uladzislau Rezki wrote:
> On Tue, May 16, 2023 at 04:38:58PM +0200, Thomas Gleixner wrote:
>> There is a world outside of x86, but even on x86 it's borderline silly
>> to take the whole TLB out when you can flush 3 TLB entries one by one
>> with exactly the same number of IPIs, i.e. _one_. No?
>> 
> I meant if we invoke flush_tlb_kernel_range() on each VA's individual
> range:
>
> <ARM>
> void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> {
> 	if (tlb_ops_need_broadcast()) {
> 		struct tlb_args ta;
> 		ta.ta_start = start;
> 		ta.ta_end = end;
> 		on_each_cpu(ipi_flush_tlb_kernel_range, &ta, 1);
> 	} else
> 		local_flush_tlb_kernel_range(start, end);
> 	broadcast_tlb_a15_erratum();
> }
> <ARM>
>
> we should IPI and wait, no?

The else clause does not do an IPI, but that's irrelevant.

The proposed flush_tlb_kernel_vas(list, num_pages) mechanism
achieves:

  1) It batches multiple ranges to _one_ invocation

  2) It lets the architecture decide based on the number of pages
     whether it does a tlb_flush_all() or a flush of individual ranges.

Whether the architecture uses IPIs or flushes only locally and the
hardware propagates that is completely irrelevant.

Right now any coalesced range, which is huge due to massive holes, takes
decision #2 away.

If you want to flush individual VAs from the core vmalloc code then you
lose #1, as the aggregated number of pages might justify a tlb_flush_all().

That's a pure architecture decision and all the core code needs to do is
to provide appropriate information and not some completely bogus request
to flush 17312759359 pages, i.e. a ~64.5 TB range, while in reality
there are exactly _three_ distinct pages to flush.

Thanks,

        tglx