[RFC PATCH 0/3] um: clean up mm creation - another attempt

Benjamin Berg benjamin at sipsolutions.net
Wed Sep 27 02:52:16 PDT 2023


Hi,

On Tue, 2023-09-26 at 14:38 +0200, Johannes Berg wrote:
> [SNIP]
> 1. Start from scratch, without copying, which my other patch [1] did.

I really think we should go ahead with that approach. Then follow up
with optimizations.

> [SNIP]
> 
> I think the better approach for correctness and integration into the
> kernel would be to actually admit that UML is special because page
> faults are so expensive, and
> 
>  * start with a fresh mm process every time
>  * have vma_needs_copy() return true
>  * completely fill the mappings according to only the new mm's VMAs
>    in arch_dup_mmap() or perhaps later
> 
> I don't know how that'd behave wrt. performance, though it likely cannot
> be better than with these patches, but at least it'd be more correct,
> and more obviously correct too, for starters, because then the actual
> mappings in the UML mm process would actually reflect the PTEs that
> Linux knows about.

Yes, performance may degrade, but the implementation should be correct
in the first place. Note that even though we looked at it (and e.g.
found that MMAP_DONTFORK is incorrect), we have not figured out why the
first approach is slower currently as everything interesting should be
getting unmapped by the force_flush_all.

Once we are there, we can look for optimizations. The fundamental
problem is that page faults (even minor ones) are extremely expensive
for us.

Just throwing out ideas on what we could do:
   1. SECCOMP as that reduces the amount of context switches.
      (Yes, I know I should resubmit the patchset)
   2. Maybe we can disable/cripple page access tracking? If we assume
      initially mark all pages as accessed by userspace (i.e.
      pte_mkyoung), then we avoid a minor page fault on first access.
      Doing that will mess with page eviction though.
   3. Do DAX (direct_access) for files. i.e. mmap files directly in the
      host kernel rather than through UM.
      With a hostfs like file system, one should be able to add an
      intermediate block device that maps host files to physical pages,
      then do DAX in the FS.
      For disk images, the existing iomem infrastructure should be
      usable, this should work with any DAX enabled filesystems (ext2,
      ext4, xfs, virtiofs, erofs).

Benjamin

> 
> > 2. The preemption patches work fine on top (all 3 cases). The
> > performance difference stays.
> 
> OK.
> 
> > 3. We do not have anything of value to add in term of
> > cond_resched() to the drivers :(
> > Most drivers are fairly simplistic with no safe points to add this.
> 
> Yeah, not surprised by this.
> 
> > 6. Do we still need force_flush_all() in the arch_dup_mmap()? This
> > works with a non-forced tlb flush
> > using flush_tlb_mm(mm);
> 
> Maybe not, does it make a difference though?
> 
> > 7. In all cases, UML is doing something silly.
> > The CPU usage while doing find -type f -exec cat {} > /dev/null
> > measured from outside in non-preemptive and
> > PREEMPT_VOLUNTARY stays around 8-15%. The UML takes a sabbatical
> > for the remaining 85 instead of actually
> > doing work. PREEMPT is slightly better at 60, but still far from
> > 100%. It just keeps going into idle and I
> > cannot understand why.
> 
> Is it just waiting for IO?
> 
> johannes
> 
> _______________________________________________
> linux-um mailing list
> linux-um at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
> 




More information about the linux-um mailing list