[PATCH 12/12] um: refactor TLB update handling
Anton Ivanov
anton.ivanov at kot-begemot.co.uk
Mon Apr 22 00:51:56 PDT 2024
On 22/04/2024 08:22, Benjamin Berg wrote:
> On Mon, 2024-04-22 at 10:51 +0800, Tiwei Bie wrote:
>> On 4/18/24 5:23 PM, benjamin at sipsolutions.net wrote:
>>> diff --git a/arch/um/include/asm/mmu.h b/arch/um/include/asm/mmu.h
>>> index 37eb6e89e79a..bf8da736609c 100644
>>> --- a/arch/um/include/asm/mmu.h
>>> +++ b/arch/um/include/asm/mmu.h
>>> @@ -10,6 +10,10 @@
>>>
>>> typedef struct mm_context {
>>> struct mm_id id;
>>> +
>>> + /* Address range in need of a TLB sync */
>>> + long int sync_tlb_range_from;
>>> + long int sync_tlb_range_to;
>>
>> Why not "unsigned long"?
>
> Oops, yes, it should be "unsigned long".
>
>>
>>> } mm_context_t;
>>>
>>> extern void __switch_mm(struct mm_id * mm_idp);
>>> diff --git a/arch/um/include/asm/pgtable.h
>>> b/arch/um/include/asm/pgtable.h
>>> index e1ece21dbe3f..5bb397b65efb 100644
>>> --- a/arch/um/include/asm/pgtable.h
>>> +++ b/arch/um/include/asm/pgtable.h
>>> @@ -244,6 +244,38 @@ static inline void set_pte(pte_t *pteptr,
>>> pte_t pteval)
>>>
>>> #define PFN_PTE_SHIFT PAGE_SHIFT
>>>
>>> +static inline void um_tlb_mark_sync(struct mm_struct *mm, unsigned
>>> long start,
>>> + unsigned long end)
>>> +{
>>> + if (!mm->context.sync_tlb_range_to) {
>>> + mm->context.sync_tlb_range_from = start;
>>> + mm->context.sync_tlb_range_to = end;
>>> + } else {
>>> + if (start < mm->context.sync_tlb_range_from)
>>> + mm->context.sync_tlb_range_from = start;
>>> + if (end > mm->context.sync_tlb_range_to)
>>> + mm->context.sync_tlb_range_to = end;
>>> + }
>>> +}
>>
>> IIUC, in some cases, the range [sync_tlb_range_from, sync_tlb_range_to)
>> might become very large when merging non-adjacent ranges? Could that
>> be an issue?
>
> I figured it is not a big problem. It will result in scanning the
> entire page table once to check whether the NEW_PAGE bit is set on any
> PTE. I am assuming that this will happen almost never and scanning the
> page table (but not doing syscalls) is reasonably cheap at the end.
>
>>> diff --git a/arch/um/include/asm/tlbflush.h b/arch/um/include/asm/tlbflush.h
>>> index d7cf82023b74..62816f6f1c91 100644
>>> --- a/arch/um/include/asm/tlbflush.h
>>> +++ b/arch/um/include/asm/tlbflush.h
>>> @@ -9,24 +9,50 @@
>>> #include <linux/mm.h>
>>>
>>> /*
>>> - * TLB flushing:
>>> + * In UML, we need to sync the TLB over by using mmap/munmap/mprotect syscalls
>>> + * from the process handling the MM (which can be the kernel itself).
>>> + *
>>> + * To track updates, we can hook into set_ptes and flush_tlb_*. With set_ptes
>>> + * we catch all PTE transitions where memory that was unusable becomes usable.
>>> + * While with flush_tlb_* we can track any memory that becomes unusable and
>>> + * even if a higher layer of the page table was modified.
>>> + *
>>> + * So, we simply track updates using both methods and mark the memory area to
>>> + * be synced later on. The only special case is that flush_tlb_kern_* needs to
>>> + * be executed immediately as there is no good synchronization point in that
>>> + * case. In contrast, in the set_ptes case we can wait for the next kernel
>>> + * segfault before we do the synchornization.
>>> *
>>> - * - flush_tlb() flushes the current mm struct TLBs
>>> * - flush_tlb_all() flushes all processes TLBs
>>> * - flush_tlb_mm(mm) flushes the specified mm context TLB's
>>> * - flush_tlb_page(vma, vmaddr) flushes one page
>>> - * - flush_tlb_kernel_vm() flushes the kernel vm area
>>> * - flush_tlb_range(vma, start, end) flushes a range of pages
>>> + * - flush_tlb_kernel_range(start, end) flushes a range of kernel pages
>>> */
>>>
>>> +extern int um_tlb_sync(struct mm_struct *mm);
>>> +
>>> extern void flush_tlb_all(void);
>>> extern void flush_tlb_mm(struct mm_struct *mm);
>>> -extern void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
>>> - unsigned long end);
>>> -extern void flush_tlb_page(struct vm_area_struct *vma, unsigned long address);
>>> -extern void flush_tlb_kernel_vm(void);
>>> -extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
>>> -extern void __flush_tlb_one(unsigned long addr);
>>> +
>>> +static void flush_tlb_page(struct vm_area_struct *vma, unsigned long address)
>>> +{
>>> + um_tlb_mark_sync(vma->vm_mm, address, address + PAGE_SIZE);
>>> +}
>>> +
>>> +static void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
>>> + unsigned long end)
>>> +{
>>> + um_tlb_mark_sync(vma->vm_mm, start, end);
>>> +}
>>> +
>>> +static void flush_tlb_kernel_range(unsigned long start, unsigned long end)
>>> +{
>>> + um_tlb_mark_sync(&init_mm, start, end);
>>> +
>>> + /* Kernel needs to be synced immediately */
>>> + um_tlb_sync(&init_mm);
>>> +}
>>
>> Nit: this is a header file, these functions should be defined as inline functions.
>
> Yup, thanks!
>
>>> diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c
>>> index c137ff6f84dd..232aa7601d5d 100644
>>> --- a/arch/um/kernel/tlb.c
>>> +++ b/arch/um/kernel/tlb.c
>> [...]
>>>
>>> -void flush_tlb_kernel_range(unsigned long start, unsigned long
>>> end)
>>> -{
>>> - flush_tlb_kernel_range_common(start, end);
>>> -}
>>> -
>>> -void flush_tlb_kernel_vm(void)
>>> -{
>>> - flush_tlb_kernel_range_common(start_vm, end_vm);
>>> -}
>>
>> The build breaks with this change, as there is still a call to
>> flush_tlb_kernel_vm() in ubd.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/um/drivers/ubd_kern.c?id=fb5d1d389c9e78d68f1f71f926d6251017579f5b#n774
>
> Oh, thanks for the pointer!
>
> I do not see a good reason for that call to even exist. My best theory
> right now is that it existed to avoid later pagefaults for new memory
> regions (the vmalloc?). So a workaround that is not needed anymore with
> this patch.
It is there since prehistoric times.
No idea why and what it's doing. IMHO it is not needed.
>
> Benjamin
>
>
>
>
--
Anton R. Ivanov
https://www.kot-begemot.co.uk/
More information about the linux-um
mailing list