[RFC PATCH 0/3] um: clean up mm creation - another attempt

Anton Ivanov anton.ivanov at cambridgegreys.com
Tue Sep 26 06:04:24 PDT 2023



On 26/09/2023 13:38, Johannes Berg wrote:
> On Tue, 2023-09-26 at 13:16 +0100, Anton Ivanov wrote:
>>
>> For the time being it is mostly negative :)
> 
> Oh well :)
> 
>> 1. The performance after the mm patch is down. By 30-40% on my standard bench.
> 
> For the record, you mean this three-patch series that we're discussing
> in the thread of?

Yes. It has no stability issues on its own as well as with the PREEMPT patch on top.

> 
> 
> Btw, Benjamin realized that MADV_DONTFORK is broken in UML, precisely
> _because_ we fork/copy the whole mm process and then try to fix it up.
> But we can only fix up things that actually have VMAs, and of course
> there are no VMAs with VM_DONTCOPY (set by MADV_DONTFORK) in the new mm
> after fork.
> 
> To fix this, really we should either
> 
> 1. Start from scratch, without copying, which my other patch [1] did.
> 
>     [1] https://lore.kernel.org/all/20230922131638.2c57ec713d1c.Id11dff4b349e6a8f0136bb6bb09f6e01a80befbb@changeid/
> 
>     But of course that's more expensive because we now have to page-fault
>     everything in the new process, and page faults are expensive.
> 
> 2. Compare the new mm and the old mm, which requires putting it into
>     arch_dup_mmap() like these patches here - where I'm not sure I
>     understand at all why they cause a perf regression - and remove the
>     VMAs that are marked VM_DONTCOPY in the old one.
> 
> 
> To be honest I don't really like _either_ of these approaches, nor the
> current "fork the process" approach that UML takes. It's very magic, and
> very much works around how Linux works.

+1

> 
> Remember that basically the mm process contents should match the page
> tables in the VMAs; but this is decidedly not true where fork() is
> involved, because while the VMAs are copied, most of the page tables are
> _not_ copied. Thus, we have a situation where after fork we don't take
> page faults in UML that we would take in a normal system (this part is
> good for performance), and I believe also vice versa, which would then
> perhaps explain the flush_tlb_page() in handle_page_fault(), because
> honestly I don't otherwise have an explanation for it.
> 
> 
> I think the better approach for correctness and integration into the
> kernel would be to actually admit that UML is special because page
> faults are so expensive, and
> 
>   * start with a fresh mm process every time
>   * have vma_needs_copy() return true
>   * completely fill the mappings according to only the new mm's VMAs
>     in arch_dup_mmap() or perhaps later
> 
> I don't know how that'd behave wrt. performance, though it likely cannot
> be better than with these patches, but at least it'd be more correct,
> and more obviously correct too, for starters, because then the actual
> mappings in the UML mm process would actually reflect the PTEs that
> Linux knows about.

We can try that.

> 
> 
>> 2. The preemption patches work fine on top (all 3 cases). The performance difference stays.
> 
> OK.
> 
>> 3. We do not have anything of value to add in term of cond_resched() to the drivers :(
>> Most drivers are fairly simplistic with no safe points to add this.
> 
> Yeah, not surprised by this.
> 
>> 6. Do we still need force_flush_all() in the arch_dup_mmap()? This works with a non-forced tlb flush
>> using flush_tlb_mm(mm);
> 
> Maybe not, does it make a difference though?

Nope. Same numbers in both cases.

> 
>> 7. In all cases, UML is doing something silly.
>> The CPU usage while doing find -type f -exec cat {} > /dev/null measured from outside in non-preemptive and
>> PREEMPT_VOLUNTARY stays around 8-15%. The UML takes a sabbatical for the remaining 85 instead of actually
>> doing work. PREEMPT is slightly better at 60, but still far from 100%. It just keeps going into idle and I
>> cannot understand why.
> 
> Is it just waiting for IO?

Nope. Nearly all I see on strace is wait4 and PTRACE. The epoll_waits are few and far between.

The bottleneck is mm and vm, not IO :(

> 
> johannes
> 

-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/



More information about the linux-um mailing list