[PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance
Lorenzo Stoakes
ljs at kernel.org
Wed May 20 01:11:20 PDT 2026
On Tue, May 19, 2026 at 02:02:09PM -0700, Yang Shi wrote:
> On Tue, May 19, 2026 at 11:41 AM Yang Shi <shy828301 at gmail.com> wrote:
> > >
> > > >
> > > > >
> > > > > Secondly, if vma->anon_vma is NULL, it basically means either no page
> > > > > fault happened or no cow happened, so there is no page table to copy,
> > > > > this is also what copy_page_range() does currently. So we can shrink
> > > > > the critical section to:
> > > >
> > > > Firstly, with no VMA write lock, !vma->anon_vma means a fault can race and
> > > > secondly copy_page_range() checks vma_needs_copy(), there are other cases - PFN
> > > > maps, mixed maps, UFFD W/P (ugh), guard regions.
> > > >
> > > > So yeah this isn't sufficient.
> > >
> > > However this is true...
> >
> > Yes, fault can race with fork. Basically this is actually the purpose
> > of this idea. We can have improved page fault scalability. In my
> > proposal (take write vma lock if vma->anon_vma is not NULL), the race
> > just happens on the VMAs which page fault has not happened on before.
>
> Sorry, this is incorrect. Page fault can't happen on those VMAs
> because page fault needs to create anon_vma, but it requires taking
> mmap_lock.
> If anon_vma is not NULL, vma write lock will serialize against page
> fault. So there should be no race with page fault. Removing vma write
> lock suggested by Barry may increase race.
Firstly, let's none of us be worried about making mistakes here, the anon_vma
stuff is confusing, and I've stared at it more than mostly, and even so I
managed to make mistakes (as corrected here) and forget details :))
It's a sign it all needs simplifying, but hey that's what my scalable CoW
project is (partly) about :)
Removing the VMA write lock would cause races with page fault which can result
in page tables being installed which are then not correctly duplicated for
ranges that must be.
And again I think the underlying thing here overall I think is:
1. Clearly many cases require serialisation (any that cause copy_page_range() to
fire).
2. If we were to decide not to take a lock with concurrent page faults, that
lays a trap for any future change that (reasonably) assumes that page tables
cannot be simultaneously copied while being accessible to page fault
handlers, which is bug prone.
3. As per 2, even if we were to only take the lock when we felt we absolutely
needed to, we still cause risk through adding yet another 'you just have to
know' risk to this part of mm.
4. The serialisation is quite likely relied upon by other things, this is often
the case in mm, and we may only realise that such serialisation is critical
at the point a subtle issue arises out of it.
5. Fork is one of the most sensitive, intuation-defying, complicated, and
corner- case-problem-baiting areas of mm and I really oppose us changing
fundamental behaviour here unless incredibly well justified.
On this basis, let's let the sleeping dogs lie and leave fork alone I think :)
I think I am far more inclined to take Barry's fault approach (as I've said to
him) vs. changing fork behaviour.
But I want to make sure there's not a 'third way' that could avoid either!
I am going to have a look through Barry's series in detail so we can have some
movement on this one way or another :)
>
> Thanks,
> Yang
>
Cheers, Lorenzo
More information about the linux-riscv
mailing list