[PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance
Barry Song
baohua at kernel.org
Sun May 17 01:45:15 PDT 2026
On Sat, May 2, 2026 at 1:58 AM Matthew Wilcox <willy at infradead.org> wrote:
>
> On Sat, May 02, 2026 at 01:44:34AM +0800, Barry Song wrote:
> > On Fri, May 1, 2026 at 10:57 PM Matthew Wilcox <willy at infradead.org> wrote:
> > >
> > > On Fri, May 01, 2026 at 06:49:58AM +0800, Barry Song wrote:
> > > > 1. There is no deterministic latency for I/O completion. It depends on
> > > > both the hardware and the software stack (bio/request queues and the
> > > > block scheduler). Sometimes the latency is short; at other times it can
> > > > be quite long. In such cases, a high-priority thread performing operations
> > > > such as mprotect, unmap, prctl_set_vma, or madvise may be forced to wait
> > > > for an unpredictable amount of time.
> > >
> > > But does that actually happen? I find it hard to believe that thread A
> > > unmaps a VMA while thread B is in the middle of taking a page fault in
> > > that same VMA. mprotect() and madvise() are more likely to happen, but
> > > it still seems really unlikely to me.
> >
> > It doesn’t have to involve unmapping or applying mprotect to
> > the entire VMA—just a portion of it is sufficient.
>
> Yes, but that still fails to answer "does this actually happen". How much
> performance is all this complexity in the page fault handler buying us?
> If you don't answer this question, I'm just going to go in and rip it
> all out.
>
Hi Matthew (and Lorenzo, Jan, and anyone else who may be
waiting for answers),
As promised during LSF/MM/BPF, we conducted thorough
testing on Android phones to determine whether performing
I/O in `filemap_fault()` can block `vma_start_write()`.
I wanted to give a quick update on this question.
Nanzhe at Xiaomi created tracing scripts and ran various
applications on Android devices with I/O performed under
the VMA lock in `filemap_fault()`. We found that:
1. There are very few cases where unmap() is blocked by
page faults. I assume this is due to buggy user code
or poor synchronization between reads and unmap().
So I assume it is not a problem.
2. We observed many cases where `vma_start_write()`
is blocked by page-fault I/O in some applications.
The blocking occurs in the `dup_mmap()` path during
fork().
With Suren's commit fb49c455323ff ("fork: lock VMAs of
the parent process when forking"), we now always hold
`vma_write_lock()` for each VMA. Note that the
`mmap_lock` write lock is also held, which could lead to
chained waiting if page-fault I/O is performed without
releasing the VMA lock.
My gut feeling is that Suren's commit may be overshooting,
so my rough idea is that we might want to do something like
the following (we haven't tested it yet and it might be
wrong):
diff --git a/mm/mmap.c b/mm/mmap.c
index 2311ae7c2ff4..5ddaf297f31a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1762,7 +1762,13 @@ __latent_entropy int dup_mmap(struct mm_struct
*mm, struct mm_struct *oldmm)
for_each_vma(vmi, mpnt) {
struct file *file;
- retval = vma_start_write_killable(mpnt);
+ /*
+ * For anonymous or writable private VMAs, prevent
+ * concurrent CoW faults.
+ */
+ if (!mpnt->vm_file || (!(mpnt->vm_flags & VM_SHARED) &&
+ (mpnt->vm_flags & VM_WRITE)))
+ retval = vma_start_write_killable(mpnt);
if (retval < 0)
goto loop_out;
if (mpnt->vm_flags & VM_DONTCOPY) {
Based on the above, we may want to re-check whether fork()
can be blocked by page faults. At the same time, if Suren,
you, or anyone else has any comments, please feel free to
share them.
Best Regards
Barry
More information about the linux-riscv
mailing list