[RFC PATCH 0/3] um: clean up mm creation - another attempt

Tue Sep 26 05:38:45 PDT 2023

On Tue, 2023-09-26 at 13:16 +0100, Anton Ivanov wrote:
> 
> For the time being it is mostly negative :)

Oh well :)

> 1. The performance after the mm patch is down. By 30-40% on my standard bench.

For the record, you mean this three-patch series that we're discussing
in the thread of?

Btw, Benjamin realized that MADV_DONTFORK is broken in UML, precisely
_because_ we fork/copy the whole mm process and then try to fix it up.
But we can only fix up things that actually have VMAs, and of course
there are no VMAs with VM_DONTCOPY (set by MADV_DONTFORK) in the new mm
after fork.

To fix this, really we should either

1. Start from scratch, without copying, which my other patch [1] did.

   [1] https://lore.kernel.org/all/20230922131638.2c57ec713d1c.Id11dff4b349e6a8f0136bb6bb09f6e01a80befbb@changeid/

   But of course that's more expensive because we now have to page-fault
   everything in the new process, and page faults are expensive.

2. Compare the new mm and the old mm, which requires putting it into
   arch_dup_mmap() like these patches here - where I'm not sure I
   understand at all why they cause a perf regression - and remove the
   VMAs that are marked VM_DONTCOPY in the old one.

To be honest I don't really like _either_ of these approaches, nor the
current "fork the process" approach that UML takes. It's very magic, and
very much works around how Linux works.

Remember that basically the mm process contents should match the page
tables in the VMAs; but this is decidedly not true where fork() is
involved, because while the VMAs are copied, most of the page tables are
_not_ copied. Thus, we have a situation where after fork we don't take
page faults in UML that we would take in a normal system (this part is
good for performance), and I believe also vice versa, which would then
perhaps explain the flush_tlb_page() in handle_page_fault(), because
honestly I don't otherwise have an explanation for it.

I think the better approach for correctness and integration into the
kernel would be to actually admit that UML is special because page
faults are so expensive, and

 * start with a fresh mm process every time
 * have vma_needs_copy() return true
 * completely fill the mappings according to only the new mm's VMAs
   in arch_dup_mmap() or perhaps later

I don't know how that'd behave wrt. performance, though it likely cannot
be better than with these patches, but at least it'd be more correct,
and more obviously correct too, for starters, because then the actual
mappings in the UML mm process would actually reflect the PTEs that
Linux knows about.

> 2. The preemption patches work fine on top (all 3 cases). The performance difference stays.

OK.

> 3. We do not have anything of value to add in term of cond_resched() to the drivers :(
> Most drivers are fairly simplistic with no safe points to add this.

Yeah, not surprised by this.

> 6. Do we still need force_flush_all() in the arch_dup_mmap()? This works with a non-forced tlb flush
> using flush_tlb_mm(mm);

Maybe not, does it make a difference though?

> 7. In all cases, UML is doing something silly.
> The CPU usage while doing find -type f -exec cat {} > /dev/null measured from outside in non-preemptive and
> PREEMPT_VOLUNTARY stays around 8-15%. The UML takes a sabbatical for the remaining 85 instead of actually
> doing work. PREEMPT is slightly better at 60, but still far from 100%. It just keeps going into idle and I
> cannot understand why.

Is it just waiting for IO?

johannes