[PATCH 18/41] mm/khugepaged: write-lock VMA while collapsing a huge page

Wed Jan 18 01:40:09 PST 2023

On Tue 17-01-23 21:28:06, Jann Horn wrote:
> On Tue, Jan 17, 2023 at 4:25 PM Michal Hocko <mhocko at suse.com> wrote:
> > On Mon 09-01-23 12:53:13, Suren Baghdasaryan wrote:
> > > Protect VMA from concurrent page fault handler while collapsing a huge
> > > page. Page fault handler needs a stable PMD to use PTL and relies on
> > > per-VMA lock to prevent concurrent PMD changes. pmdp_collapse_flush(),
> > > set_huge_pmd() and collapse_and_free_pmd() can modify a PMD, which will
> > > not be detected by a page fault handler without proper locking.
> >
> > I am struggling with this changelog. Maybe because my recollection of
> > the THP collapsing subtleties is weak. But aren't you just trying to say
> > that the current #PF handling and THP collapsing need to be mutually
> > exclusive currently so in order to keep that assumption you have mark
> > the vma write locked?
> >
> > Also it is not really clear to me how that handles other vmas which can
> > share the same thp?
> 
> It's not about the hugepage itself, it's about how the THP collapse
> operation frees page tables.
> 
> Before this series, page tables can be walked under any one of the
> mmap lock, the mapping lock, and the anon_vma lock; so when khugepaged
> unlinks and frees page tables, it must ensure that all of those either
> are locked or don't exist. This series adds a fourth lock under which
> page tables can be traversed, and so khugepaged must also lock out that one.
> 
> There is a codepath in khugepaged that iterates through all mappings
> of a file to zap page tables (retract_page_tables()), which locks each
> visited mm with mmap_write_trylock() and now also does
> vma_write_lock().

OK, I see. This would be a great addendum to the changelog.

> I think one aspect of this patch that might cause trouble later on, if
> support for non-anonymous VMAs is added, is that retract_page_tables()
> now does vma_write_lock() while holding the mapping lock; the page
> fault handling path would probably take the locks the other way
> around, leading to a deadlock? So the vma_write_lock() in
> retract_page_tables() might have to become a trylock later on.

This, right?
#PF			retract_page_tables
vma_read_lock
			i_mmap_lock_write
i_mmap_lock_read
			vma_write_lock

I might be missing something but I have only found huge_pmd_share to be
called from the #PF path. That one should be safe as it cannot be a
target for THP. Not that it would matter much because such a dependency
chain would be really subtle.
-- 
Michal Hocko
SUSE Labs