[PATCH 12/12] um: refactor TLB update handling

Anton Ivanov anton.ivanov at kot-begemot.co.uk
Mon Apr 22 00:51:56 PDT 2024



On 22/04/2024 08:22, Benjamin Berg wrote:
> On Mon, 2024-04-22 at 10:51 +0800, Tiwei Bie wrote:
>> On 4/18/24 5:23 PM, benjamin at sipsolutions.net wrote:
>>> diff --git a/arch/um/include/asm/mmu.h b/arch/um/include/asm/mmu.h
>>> index 37eb6e89e79a..bf8da736609c 100644
>>> --- a/arch/um/include/asm/mmu.h
>>> +++ b/arch/um/include/asm/mmu.h
>>> @@ -10,6 +10,10 @@
>>>   
>>>   typedef struct mm_context {
>>>   	struct mm_id id;
>>> +
>>> +	/* Address range in need of a TLB sync */
>>> +	long int sync_tlb_range_from;
>>> +	long int sync_tlb_range_to;
>>
>> Why not "unsigned long"?
> 
> Oops, yes, it should be "unsigned long".
> 
>>
>>>   } mm_context_t;
>>>   
>>>   extern void __switch_mm(struct mm_id * mm_idp);
>>> diff --git a/arch/um/include/asm/pgtable.h
>>> b/arch/um/include/asm/pgtable.h
>>> index e1ece21dbe3f..5bb397b65efb 100644
>>> --- a/arch/um/include/asm/pgtable.h
>>> +++ b/arch/um/include/asm/pgtable.h
>>> @@ -244,6 +244,38 @@ static inline void set_pte(pte_t *pteptr,
>>> pte_t pteval)
>>>   
>>>   #define PFN_PTE_SHIFT		PAGE_SHIFT
>>>   
>>> +static inline void um_tlb_mark_sync(struct mm_struct *mm, unsigned
>>> long start,
>>> +				    unsigned long end)
>>> +{
>>> +	if (!mm->context.sync_tlb_range_to) {
>>> +		mm->context.sync_tlb_range_from = start;
>>> +		mm->context.sync_tlb_range_to = end;
>>> +	} else {
>>> +		if (start < mm->context.sync_tlb_range_from)
>>> +			mm->context.sync_tlb_range_from = start;
>>> +		if (end > mm->context.sync_tlb_range_to)
>>> +			mm->context.sync_tlb_range_to = end;
>>> +	}
>>> +}
>>
>> IIUC, in some cases, the range [sync_tlb_range_from, sync_tlb_range_to)
>> might become very large when merging non-adjacent ranges? Could that
>> be an issue?
> 
> I figured it is not a big problem. It will result in scanning the
> entire page table once to check whether the NEW_PAGE bit is set on any
> PTE. I am assuming that this will happen almost never and scanning the
> page table (but not doing syscalls) is reasonably cheap at the end.
> 
>>> diff --git a/arch/um/include/asm/tlbflush.h b/arch/um/include/asm/tlbflush.h
>>> index d7cf82023b74..62816f6f1c91 100644
>>> --- a/arch/um/include/asm/tlbflush.h
>>> +++ b/arch/um/include/asm/tlbflush.h
>>> @@ -9,24 +9,50 @@
>>>   #include <linux/mm.h>
>>>   
>>>   /*
>>> - * TLB flushing:
>>> + * In UML, we need to sync the TLB over by using mmap/munmap/mprotect syscalls
>>> + * from the process handling the MM (which can be the kernel itself).
>>> + *
>>> + * To track updates, we can hook into set_ptes and flush_tlb_*. With set_ptes
>>> + * we catch all PTE transitions where memory that was unusable becomes usable.
>>> + * While with flush_tlb_* we can track any memory that becomes unusable and
>>> + * even if a higher layer of the page table was modified.
>>> + *
>>> + * So, we simply track updates using both methods and mark the memory area to
>>> + * be synced later on. The only special case is that flush_tlb_kern_* needs to
>>> + * be executed immediately as there is no good synchronization point in that
>>> + * case. In contrast, in the set_ptes case we can wait for the next kernel
>>> + * segfault before we do the synchornization.
>>>    *
>>> - *  - flush_tlb() flushes the current mm struct TLBs
>>>    *  - flush_tlb_all() flushes all processes TLBs
>>>    *  - flush_tlb_mm(mm) flushes the specified mm context TLB's
>>>    *  - flush_tlb_page(vma, vmaddr) flushes one page
>>> - *  - flush_tlb_kernel_vm() flushes the kernel vm area
>>>    *  - flush_tlb_range(vma, start, end) flushes a range of pages
>>> + *  - flush_tlb_kernel_range(start, end) flushes a range of kernel pages
>>>    */
>>>   
>>> +extern int um_tlb_sync(struct mm_struct *mm);
>>> +
>>>   extern void flush_tlb_all(void);
>>>   extern void flush_tlb_mm(struct mm_struct *mm);
>>> -extern void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
>>> -			    unsigned long end);
>>> -extern void flush_tlb_page(struct vm_area_struct *vma, unsigned long address);
>>> -extern void flush_tlb_kernel_vm(void);
>>> -extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
>>> -extern void __flush_tlb_one(unsigned long addr);
>>> +
>>> +static void flush_tlb_page(struct vm_area_struct *vma, unsigned long address)
>>> +{
>>> +	um_tlb_mark_sync(vma->vm_mm, address, address + PAGE_SIZE);
>>> +}
>>> +
>>> +static void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
>>> +			    unsigned long end)
>>> +{
>>> +	um_tlb_mark_sync(vma->vm_mm, start, end);
>>> +}
>>> +
>>> +static void flush_tlb_kernel_range(unsigned long start, unsigned long end)
>>> +{
>>> +	um_tlb_mark_sync(&init_mm, start, end);
>>> +
>>> +	/* Kernel needs to be synced immediately */
>>> +	um_tlb_sync(&init_mm);
>>> +}
>>
>> Nit: this is a header file, these functions should be defined as inline functions.
> 
> Yup, thanks!
> 
>>> diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c
>>> index c137ff6f84dd..232aa7601d5d 100644
>>> --- a/arch/um/kernel/tlb.c
>>> +++ b/arch/um/kernel/tlb.c
>> [...]
>>>   
>>> -void flush_tlb_kernel_range(unsigned long start, unsigned long
>>> end)
>>> -{
>>> -	flush_tlb_kernel_range_common(start, end);
>>> -}
>>> -
>>> -void flush_tlb_kernel_vm(void)
>>> -{
>>> -	flush_tlb_kernel_range_common(start_vm, end_vm);
>>> -}
>>
>> The build breaks with this change, as there is still a call to
>> flush_tlb_kernel_vm() in ubd.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/um/drivers/ubd_kern.c?id=fb5d1d389c9e78d68f1f71f926d6251017579f5b#n774
> 
> Oh, thanks for the pointer!
> 
> I do not see a good reason for that call to even exist. My best theory
> right now is that it existed to avoid later pagefaults for new memory
> regions (the vmalloc?). So a workaround that is not needed anymore with
> this patch.

It is there since prehistoric times.

No idea why and what it's doing. IMHO it is not needed.

> 
> Benjamin
> 
> 
> 
> 

-- 
Anton R. Ivanov
https://www.kot-begemot.co.uk/



More information about the linux-um mailing list