[RFC PATCH 0/3] um: clean up mm creation - another attempt

Anton Ivanov anton.ivanov at cambridgegreys.com
Wed Jan 17 11:45:20 PST 2024


On 17/01/2024 17:17, Benjamin Berg wrote:
> Hi,
> 
> On Wed, 2023-09-27 at 11:52 +0200, Benjamin Berg wrote:
>> [SNIP]
>> Once we are there, we can look for optimizations. The fundamental
>> problem is that page faults (even minor ones) are extremely expensive
>> for us.
>>
>> Just throwing out ideas on what we could do:
>>     1. SECCOMP as that reduces the amount of context switches.
>>        (Yes, I know I should resubmit the patchset)
>>     2. Maybe we can disable/cripple page access tracking? If we assume
>>        initially mark all pages as accessed by userspace (i.e.
>>        pte_mkyoung), then we avoid a minor page fault on first access.
>>        Doing that will mess with page eviction though.
>>     3. Do DAX (direct_access) for files. i.e. mmap files directly in the
>>        host kernel rather than through UM.
>>        With a hostfs like file system, one should be able to add an
>>        intermediate block device that maps host files to physical pages,
>>        then do DAX in the FS.
>>        For disk images, the existing iomem infrastructure should be
>>        usable, this should work with any DAX enabled filesystems (ext2,
>>        ext4, xfs, virtiofs, erofs).
> 
> So, I experimented quite a bit over Christmas (including getting DAX to
> work with virtiofs). At the end of all this my conclusion is that
> insufficient page table synchronization is our main problem.
> 
> Basically, right now we rely on the flush_tlb_* functions from the
> kernel, but these are only called when TLB entries are removed, *not*
> when new PTEs are added (there is also update_mmu_cache, but it isn't
> enough either). Effectively this means that new page table entries will
> often only be synced because the userspace code runs into an
> unnecessary segfright now we rely on the flush_tlb_* functions from the
> kernel, but these are only called when TLB entries are removed, *not*
> when new PTEs are added (there is also update_mmu_cache, but it isn't
> enough either). Effectively this means that new page table entries will
> often only be synced because the userspace code runs into an
> unnecessary segfaultault.
>   
> Really, what we need is a set_pte_at() implementation that marks the
> memory range for synchronization. Then we can make sure we sync it
> before switching to the userspace process (the equivalent of running
> flush_tlb_mm_range right now).
> 
> I think we should:
>   * Rewrite the userspace syscall code
>     - Support delaying the execution of syscalls
>     - Only support mmap/munmap/mprotect and LDT
>     - Do simple compression of consecutive syscalls here
>     - Drop the hand-written assembler
>   * Improve the tlb.c code
>     - remove the HVC abstraction

Cool. That was not working particularly well. I tried to improve it a 
few times, but ripping it out and replacing it is probably a better idea.

>     - never force immediate syscall execution
>   * Let set_pte_at() track which memory ranges that need syncing
>   * At that point we should be able to:
>     - drop copy_context_skas0
>     - make flush_tlb_* no-ops
>     - drop flush_tlb_page from handle_page_fault
>     - move unmap() from flush_thread to init_new_context
>       (or do it as part of start_userspace)
> 
> So, I did try this using nasty hacks and IIRC one of my runs was going
> from 21s to 16s and another from 63s to 56s. Which seems like a nice
> improvement.

Excellent. I assume you were using hostfs as usual, right? If so, the 
difference is likely to be even more noticeable on ubd.

> 
> Benjamin
> 
> 
> PS: As for DAX, it doesn't really seem to help performance. It didn't
> seem to lower the amount of page faults in UML. And, from my
> perspective, it isn't really worth just for the memory sharing.
> 
> PPS: dirty/young tracking seemed to be only cause a small amount of
> page faults in the grand scheme. So probably not something worth
> following up on.
> 

-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/




More information about the linux-um mailing list