[PATCH RFC 0/6] mm: THP-agnostic refactor on huge mappings
David Hildenbrand
david at redhat.com
Tue Jul 23 01:18:37 PDT 2024
On 22.07.24 17:31, Peter Xu wrote:
> On Mon, Jul 22, 2024 at 03:29:43PM +0200, David Hildenbrand wrote:
>> On 18.07.24 00:02, Peter Xu wrote:
>>> This is an RFC series, so not yet for merging. Please don't be scared by
>>> the code changes: most of them are code movements only.
>>>
>>> This series is based on the dax mprotect fix series here (while that one is
>>> based on mm-unstable):
>>>
>>> [PATCH v3 0/8] mm/mprotect: Fix dax puds
>>> https://lore.kernel.org/r/20240715192142.3241557-1-peterx@redhat.com
>>>
>>> Overview
>>> ========
>>>
>>> This series doesn't provide any feature change. The only goal of this
>>> series is to start decoupling two ideas: "THP" and "huge mapping". We
>>> already started with having PGTABLE_HAS_HUGE_LEAVES config option, and this
>>> one extends that idea into the code.
>>>
>>> The issue is that we have so many functions that only compile with
>>> CONFIG_THP=on, even though they're about huge mappings, and huge mapping is
>>> a pretty common concept, which can apply to many things besides THPs
>>> nowadays. The major THP file is mm/huge_memory.c as of now.
>>>
>>> The first example of such huge mapping users will be hugetlb. We lived
>>> until now with no problem simply because Linux almost duplicated all the
>>> logics there in the "THP" files into hugetlb APIs. If we want to get rid
>>> of hugetlb specific APIs and paths, this _might_ be the first thing we want
>>> to do, because we want to be able to e.g., zapping a hugetlb pmd entry even
>>> if !CONFIG_THP.
>>>
>>> Then consider other things like dax / pfnmaps. Dax can depend on THP, then
>>> it'll naturally be able to use pmd/pud helpers, that's okay. However is it
>>> a must? Do we also want to have every new pmd/pud mappings in the future
>>> to depend on THP (like PFNMAP)? My answer is no, but I'm open to opinions.
>>>
>>> If anyone agrees with me that "huge mapping" (aka, PMD/PUD mappings that
>>> are larger than PAGE_SIZE) is a more generic concept than THP, then I think
>>> at some point we need to move the generic code out of THP code into a
>>> common code base.
>>>
>>> This is what this series does as a start.
>>
>> Hi Peter!
>>
>> From a quick glimpse, patch #1-#4 do make sense independent of patch #5.
>>
>> I am not so sure about all of the code movement in patch #5. If large folios
>> are the future, then likely huge_memory.c should simply be the home for all
>> that logic.
>>
>> Maybe the goal should better be to compile huge_memory.c not only for THP,
>> but also for other use cases that require that logic, and fence off all THP
>> specific stuff using #ifdef?
>>
>> Not sure, though. But a lot of this code movements/churn might be avoidable.
>
> I'm fine using ifdefs in the current fine, but IMHO it's a matter of
> whether we want to keep huge_memory.c growing into even larger file, and
> keep all large folio logics only in that file. Currently it's ~4000 LOCs.
Depends on "how much" for sure. huge_memory.c is currently on place 12
of the biggest files in mm/. So there might not be immediate cause for
action ... just yet :) [guess which file is on #2 :) ]
>
> Nornally I don't see this as much of a "code churn" category, because it
> doesn't changes the code itself but only move things. I personally also
> prefer without code churns, but only in the case where there'll be tiny
> little functional changes here and there without real benefit.
>
> It's pretty unavoidable to me when one file grows too large and we'll need
> to split, and in this case git doesn't have a good way to track such
> movement..
Yes, that's what I mean.
I've been recently thinking if we should pursue a different direction:
Just as we recently relocated most follow_huge_* stuff into gup.c,
likely we should rather look into moving copy_huge_pmd, change_huge_pmd,
copy_huge_pmd ... into the files where they logically belong to.
In madvise.c, we've been doing that in some places already: For
madvise_cold_or_pageout_pte_range() we inline the code, but not for
madvise_free_huge_pmd().
pmd_trans_huge() would already compile to a NOP without
CONFIG_TRANSPARENT_HUGEPAGE, but to make that code avoid most
CONFIG_TRANSPARENT_HUGEPAGE, we'd need a couple more function stubs to
make the compiler happy while still being able to compile that code out
when not required.
The idea would be that e.g., pmd_leaf() would return "false" at compile
time if no active configuration (THP, HUGETLB, ...) would be active. So
we could just use pmd_leaf() similar to pmd_trans_huge() in relevant
code and have the compiler optimize it all out without putting it into
separate files.
That means, large folios and PMD/PUD mappings will become "more common"
and better integrated, without the need to jump between files.
Just some thought about an alternative that would make sense to me.
>
> Irrelevant of this, just to mention I think there's still one option that I
> at least can make the huge pfnmap depends on THP again which shouldn't be a
> huge deal (I don't have any use case that needs huge pfnmap but disable
> THP, anyway..), so this series isn't an immediate concern to me for that
> route. But for a hugetlb rework this might be something we need to do,
> because we simplly can't make CONFIG_HUGETLB rely on CONFIG_THP..
Yes, likely. FSDAX went a similar direction and called that FSDAX thing
a "THP" whereby it really doesn't have anything in common with a THP,
besides being partially mappable -- IMHO.
--
Cheers,
David / dhildenb
More information about the linux-arm-kernel
mailing list