[PATCH RFC 0/6] mm: THP-agnostic refactor on huge mappings

Tue Jul 23 01:18:37 PDT 2024

On 22.07.24 17:31, Peter Xu wrote:
> On Mon, Jul 22, 2024 at 03:29:43PM +0200, David Hildenbrand wrote:
>> On 18.07.24 00:02, Peter Xu wrote:
>>> This is an RFC series, so not yet for merging.  Please don't be scared by
>>> the code changes: most of them are code movements only.
>>>
>>> This series is based on the dax mprotect fix series here (while that one is
>>> based on mm-unstable):
>>>
>>>     [PATCH v3 0/8] mm/mprotect: Fix dax puds
>>>     https://lore.kernel.org/r/20240715192142.3241557-1-peterx@redhat.com
>>>
>>> Overview
>>> ========
>>>
>>> This series doesn't provide any feature change.  The only goal of this
>>> series is to start decoupling two ideas: "THP" and "huge mapping".  We
>>> already started with having PGTABLE_HAS_HUGE_LEAVES config option, and this
>>> one extends that idea into the code.
>>>
>>> The issue is that we have so many functions that only compile with
>>> CONFIG_THP=on, even though they're about huge mappings, and huge mapping is
>>> a pretty common concept, which can apply to many things besides THPs
>>> nowadays.  The major THP file is mm/huge_memory.c as of now.
>>>
>>> The first example of such huge mapping users will be hugetlb.  We lived
>>> until now with no problem simply because Linux almost duplicated all the
>>> logics there in the "THP" files into hugetlb APIs.  If we want to get rid
>>> of hugetlb specific APIs and paths, this _might_ be the first thing we want
>>> to do, because we want to be able to e.g., zapping a hugetlb pmd entry even
>>> if !CONFIG_THP.
>>>
>>> Then consider other things like dax / pfnmaps.  Dax can depend on THP, then
>>> it'll naturally be able to use pmd/pud helpers, that's okay.  However is it
>>> a must?  Do we also want to have every new pmd/pud mappings in the future
>>> to depend on THP (like PFNMAP)?  My answer is no, but I'm open to opinions.
>>>
>>> If anyone agrees with me that "huge mapping" (aka, PMD/PUD mappings that
>>> are larger than PAGE_SIZE) is a more generic concept than THP, then I think
>>> at some point we need to move the generic code out of THP code into a
>>> common code base.
>>>
>>> This is what this series does as a start.
>>
>> Hi Peter!
>>
>>  From a quick glimpse, patch #1-#4 do make sense independent of patch #5.
>>
>> I am not so sure about all of the code movement in patch #5. If large folios
>> are the future, then likely huge_memory.c should simply be the home for all
>> that logic.
>>
>> Maybe the goal should better be to compile huge_memory.c not only for THP,
>> but also for other use cases that require that logic, and fence off all THP
>> specific stuff using #ifdef?
>>
>> Not sure, though. But a lot of this code movements/churn might be avoidable.
> 
> I'm fine using ifdefs in the current fine, but IMHO it's a matter of
> whether we want to keep huge_memory.c growing into even larger file, and
> keep all large folio logics only in that file.  Currently it's ~4000 LOCs.

Depends on "how much" for sure. huge_memory.c is currently on place 12 
of the biggest files in mm/. So there might not be immediate cause for 
action ... just yet :) [guess which file is on #2 :) ]

> 
> Nornally I don't see this as much of a "code churn" category, because it
> doesn't changes the code itself but only move things.  I personally also
> prefer without code churns, but only in the case where there'll be tiny
> little functional changes here and there without real benefit.
> 
> It's pretty unavoidable to me when one file grows too large and we'll need
> to split, and in this case git doesn't have a good way to track such
> movement..

Yes, that's what I mean.

I've been recently thinking if we should pursue a different direction:

Just as we recently relocated most follow_huge_* stuff into gup.c, 
likely we should rather look into moving copy_huge_pmd, change_huge_pmd, 
copy_huge_pmd ... into the files where they logically belong to.

In madvise.c, we've been doing that in some places already: For 
madvise_cold_or_pageout_pte_range() we inline the code, but not for 
madvise_free_huge_pmd().

pmd_trans_huge() would already compile to a NOP without 
CONFIG_TRANSPARENT_HUGEPAGE, but to make that code avoid most 
CONFIG_TRANSPARENT_HUGEPAGE, we'd need a couple more function stubs to 
make the compiler happy while still being able to compile that code out 
when not required.

The idea would be that e.g., pmd_leaf() would return "false" at compile 
time if no active configuration (THP, HUGETLB, ...) would be active. So 
we could just use pmd_leaf() similar to pmd_trans_huge() in relevant 
code and have the compiler optimize it all out without putting it into 
separate files.

That means, large folios and PMD/PUD mappings will become "more common" 
and better integrated, without the need to jump between files.

Just some thought about an alternative that would make sense to me.

> 
> Irrelevant of this, just to mention I think there's still one option that I
> at least can make the huge pfnmap depends on THP again which shouldn't be a
> huge deal (I don't have any use case that needs huge pfnmap but disable
> THP, anyway..), so this series isn't an immediate concern to me for that
> route.  But for a hugetlb rework this might be something we need to do,
> because we simplly can't make CONFIG_HUGETLB rely on CONFIG_THP..

Yes, likely. FSDAX went a similar direction and called that FSDAX thing 
a "THP" whereby it really doesn't have anything in common with a THP, 
besides being partially mappable -- IMHO.

-- 
Cheers,

David / dhildenb