Excessive TLB flush ranges

Baoquan He bhe at redhat.com
Wed May 17 03:52:24 PDT 2023


On 05/17/23 at 11:38am, Thomas Gleixner wrote:
> On Tue, May 16 2023 at 21:03, Thomas Gleixner wrote:
> >
> > Aside of that, if I read the code correctly then if there is an unmap
> > via vb_free() which does not cover the whole vmap block then vb->dirty
> > is set and every _vm_unmap_aliases() invocation flushes that dirty range
> > over and over until that vmap block is completely freed, no?
> 
> Something like the below would cure that.
> 
> While it prevents that this is flushed forever it does not cure the
> eventually overly broad flush when the block is completely dirty and
> purged:
> 
> Assume a block with 1024 pages, where 1022 pages are already freed and
> TLB flushed. Now the last 2 pages are freed and the block is purged,
> which results in a flush of 1024 pages where 1022 are already done,
> right?

This is good idea, I am thinking how to reply to your last mail and how
to fix this. While your cure code may not work well. Please see below
inline comment. 

One vmap block has 64 pages.
#define VMAP_MAX_ALLOC          BITS_PER_LONG   /* 256K with 4K pages */

> 
> Thanks,
> 
>         tglx
> ---
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2211,7 +2211,7 @@ static void vb_free(unsigned long addr,
>  
>  	spin_lock(&vb->lock);
>  
> -	/* Expand dirty range */
> +	/* Expand the not yet TLB flushed dirty range */
>  	vb->dirty_min = min(vb->dirty_min, offset);
>  	vb->dirty_max = max(vb->dirty_max, offset + (1UL << order));
>  
> @@ -2240,13 +2240,17 @@ static void _vm_unmap_aliases(unsigned l
>  		rcu_read_lock();
>  		list_for_each_entry_rcu(vb, &vbq->free, free_list) {
>  			spin_lock(&vb->lock);
> -			if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) {
> +			if (vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
>  				unsigned long va_start = vb->va->va_start;
>  				unsigned long s, e;

When vb_free() is invoked, it could cause three kinds of vmap_block as
below. Your code works well for the 2nd case, for the 1st one, it may be
not. And the 2nd one is the stuff that we reclaim and put into purge
list in purge_fragmented_blocks_allcpus().

1)
  |-----|------------|-----------|-------|
  |dirty|still mapped|   dirty   | free  |

2)
  |------------------------------|-------|
  |         dirty                | free  |

3) Handled by free_vmap_block(), and vb is put into purge list.
  |--------------------------------------|

>  
>  				s = va_start + (vb->dirty_min << PAGE_SHIFT);
>  				e = va_start + (vb->dirty_max << PAGE_SHIFT);
>  
> +				/* Prevent that this is flushed more than once */
> +				vb->dirty_min = VMAP_BBMAP_BITS;
> +				vb->dirty_max = 0;
> +
>  				start = min(s, start);
>  				end   = max(e, end);
>  
> 




More information about the linux-arm-kernel mailing list