[RFC PATCH 0/3] um: clean up mm creation - another attempt
Anton Ivanov
anton.ivanov at cambridgegreys.com
Tue Sep 26 06:04:24 PDT 2023
On 26/09/2023 13:38, Johannes Berg wrote:
> On Tue, 2023-09-26 at 13:16 +0100, Anton Ivanov wrote:
>>
>> For the time being it is mostly negative :)
>
> Oh well :)
>
>> 1. The performance after the mm patch is down. By 30-40% on my standard bench.
>
> For the record, you mean this three-patch series that we're discussing
> in the thread of?
Yes. It has no stability issues on its own as well as with the PREEMPT patch on top.
>
>
> Btw, Benjamin realized that MADV_DONTFORK is broken in UML, precisely
> _because_ we fork/copy the whole mm process and then try to fix it up.
> But we can only fix up things that actually have VMAs, and of course
> there are no VMAs with VM_DONTCOPY (set by MADV_DONTFORK) in the new mm
> after fork.
>
> To fix this, really we should either
>
> 1. Start from scratch, without copying, which my other patch [1] did.
>
> [1] https://lore.kernel.org/all/20230922131638.2c57ec713d1c.Id11dff4b349e6a8f0136bb6bb09f6e01a80befbb@changeid/
>
> But of course that's more expensive because we now have to page-fault
> everything in the new process, and page faults are expensive.
>
> 2. Compare the new mm and the old mm, which requires putting it into
> arch_dup_mmap() like these patches here - where I'm not sure I
> understand at all why they cause a perf regression - and remove the
> VMAs that are marked VM_DONTCOPY in the old one.
>
>
> To be honest I don't really like _either_ of these approaches, nor the
> current "fork the process" approach that UML takes. It's very magic, and
> very much works around how Linux works.
+1
>
> Remember that basically the mm process contents should match the page
> tables in the VMAs; but this is decidedly not true where fork() is
> involved, because while the VMAs are copied, most of the page tables are
> _not_ copied. Thus, we have a situation where after fork we don't take
> page faults in UML that we would take in a normal system (this part is
> good for performance), and I believe also vice versa, which would then
> perhaps explain the flush_tlb_page() in handle_page_fault(), because
> honestly I don't otherwise have an explanation for it.
>
>
> I think the better approach for correctness and integration into the
> kernel would be to actually admit that UML is special because page
> faults are so expensive, and
>
> * start with a fresh mm process every time
> * have vma_needs_copy() return true
> * completely fill the mappings according to only the new mm's VMAs
> in arch_dup_mmap() or perhaps later
>
> I don't know how that'd behave wrt. performance, though it likely cannot
> be better than with these patches, but at least it'd be more correct,
> and more obviously correct too, for starters, because then the actual
> mappings in the UML mm process would actually reflect the PTEs that
> Linux knows about.
We can try that.
>
>
>> 2. The preemption patches work fine on top (all 3 cases). The performance difference stays.
>
> OK.
>
>> 3. We do not have anything of value to add in term of cond_resched() to the drivers :(
>> Most drivers are fairly simplistic with no safe points to add this.
>
> Yeah, not surprised by this.
>
>> 6. Do we still need force_flush_all() in the arch_dup_mmap()? This works with a non-forced tlb flush
>> using flush_tlb_mm(mm);
>
> Maybe not, does it make a difference though?
Nope. Same numbers in both cases.
>
>> 7. In all cases, UML is doing something silly.
>> The CPU usage while doing find -type f -exec cat {} > /dev/null measured from outside in non-preemptive and
>> PREEMPT_VOLUNTARY stays around 8-15%. The UML takes a sabbatical for the remaining 85 instead of actually
>> doing work. PREEMPT is slightly better at 60, but still far from 100%. It just keeps going into idle and I
>> cannot understand why.
>
> Is it just waiting for IO?
Nope. Nearly all I see on strace is wait4 and PTRACE. The epoll_waits are few and far between.
The bottleneck is mm and vm, not IO :(
>
> johannes
>
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/
More information about the linux-um
mailing list