[RFC PATCH 0/3] um: clean up mm creation - another attempt

Anton Ivanov anton.ivanov at cambridgegreys.com
Wed Sep 27 02:59:19 PDT 2023



On 27/09/2023 10:52, Benjamin Berg wrote:
> Hi,
> 
> On Tue, 2023-09-26 at 14:38 +0200, Johannes Berg wrote:
>> [SNIP]
>> 1. Start from scratch, without copying, which my other patch [1] did.
> 
> I really think we should go ahead with that approach. Then follow up
> with optimizations.

+1

> 
>> [SNIP]
>>
>> I think the better approach for correctness and integration into the
>> kernel would be to actually admit that UML is special because page
>> faults are so expensive, and
>>
>>   * start with a fresh mm process every time
>>   * have vma_needs_copy() return true
>>   * completely fill the mappings according to only the new mm's VMAs
>>     in arch_dup_mmap() or perhaps later
>>
>> I don't know how that'd behave wrt. performance, though it likely cannot
>> be better than with these patches, but at least it'd be more correct,
>> and more obviously correct too, for starters, because then the actual
>> mappings in the UML mm process would actually reflect the PTEs that
>> Linux knows about.
> 
> Yes, performance may degrade, but the implementation should be correct
> in the first place. Note that even though we looked at it (and e.g.
> found that MMAP_DONTFORK is incorrect), we have not figured out why the
> first approach is slower currently as everything interesting should be
> getting unmapped by the force_flush_all.
> 
> Once we are there, we can look for optimizations. The fundamental
> problem is that page faults (even minor ones) are extremely expensive
> for us.
> 
> Just throwing out ideas on what we could do:
>     1. SECCOMP as that reduces the amount of context switches.
>        (Yes, I know I should resubmit the patchset)

Actually... YES, YES and YES.

I was just looking at all the workaround which are in place to prevent
guest processes doing a syscall on the host. If this is prohibited at
a higher level we should get quite a boost as all these PTRACE_PEEKs
will become unnecessary.

>     2. Maybe we can disable/cripple page access tracking? If we assume
>        initially mark all pages as accessed by userspace (i.e.
>        pte_mkyoung), then we avoid a minor page fault on first access.
>        Doing that will mess with page eviction though.
>     3. Do DAX (direct_access) for files. i.e. mmap files directly in the
>        host kernel rather than through UM.
>        With a hostfs like file system, one should be able to add an
>        intermediate block device that maps host files to physical pages,
>        then do DAX in the FS.
>        For disk images, the existing iomem infrastructure should be
>        usable, this should work with any DAX enabled filesystems (ext2,
>        ext4, xfs, virtiofs, erofs).

I had some plans to do a ubd gen 2 which uses mmap and/or this. They are
presently way on the backburner. We can do some of that once we push
the new VM changes.

> 
> Benjamin
> 
>>
>>> 2. The preemption patches work fine on top (all 3 cases). The
>>> performance difference stays.
>>
>> OK.
>>
>>> 3. We do not have anything of value to add in term of
>>> cond_resched() to the drivers :(
>>> Most drivers are fairly simplistic with no safe points to add this.
>>
>> Yeah, not surprised by this.
>>
>>> 6. Do we still need force_flush_all() in the arch_dup_mmap()? This
>>> works with a non-forced tlb flush
>>> using flush_tlb_mm(mm);
>>
>> Maybe not, does it make a difference though?
>>
>>> 7. In all cases, UML is doing something silly.
>>> The CPU usage while doing find -type f -exec cat {} > /dev/null
>>> measured from outside in non-preemptive and
>>> PREEMPT_VOLUNTARY stays around 8-15%. The UML takes a sabbatical
>>> for the remaining 85 instead of actually
>>> doing work. PREEMPT is slightly better at 60, but still far from
>>> 100%. It just keeps going into idle and I
>>> cannot understand why.
>>
>> Is it just waiting for IO?
>>
>> johannes
>>
>> _______________________________________________
>> linux-um mailing list
>> linux-um at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-um
>>
> 
> 

-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/



More information about the linux-um mailing list