[PATCH v2 4/5] mm: FLEXIBLE_THP for improved performance

David Hildenbrand david at redhat.com
Fri Jul 7 04:29:02 PDT 2023


On 07.07.23 11:52, Ryan Roberts wrote:
> On 07/07/2023 09:01, Huang, Ying wrote:
>> Ryan Roberts <ryan.roberts at arm.com> writes:
>>
>>> Introduce FLEXIBLE_THP feature, which allows anonymous memory to be
>>> allocated in large folios of a specified order. All pages of the large
>>> folio are pte-mapped during the same page fault, significantly reducing
>>> the number of page faults. The number of per-page operations (e.g. ref
>>> counting, rmap management lru list management) are also significantly
>>> reduced since those ops now become per-folio.
>>
>> I likes the idea to share as much code as possible between large
>> (anonymous) folio and THP.  Finally, THP becomes just a special kind of
>> large folio.
>>
>> Although we can use smaller page order for FLEXIBLE_THP, it's hard to
>> avoid internal fragmentation completely.  So, I think that finally we
>> will need to provide a mechanism for the users to opt out, e.g.,
>> something like "always madvise never" via
>> /sys/kernel/mm/transparent_hugepage/enabled.  I'm not sure whether it's
>> a good idea to reuse the existing interface of THP.
> 
> I wouldn't want to tie this to the existing interface, simply because that
> implies that we would want to follow the "always" and "madvise" advice too; That
> means that on a thp=madvise system (which is certainly the case for android and
> other client systems) we would have to disable large anon folios for VMAs that
> haven't explicitly opted in. That breaks the intention that this should be an
> invisible performance boost. I think it's important to set the policy for use of

It will never ever be a completely invisible performance boost, just 
like ordinary THP.

Using the exact same existing toggle is the right thing to do. If 
someone specify "never" or "madvise", then do exactly that.

It might make sense to have more modes or additional toggles, but 
"madvise=never" means no memory waste.


I remember I raised it already in the past, but you *absolutely* have to 
respect the MADV_NOHUGEPAGE flag. There is user space out there (for 
example, userfaultfd) that doesn't want the kernel to populate any 
additional page tables. So if you have to respect that already, then 
also respect MADV_HUGEPAGE, simple.

> THP separately to use of large anon folios.
> 
> I could be persuaded on the merrits of a new runtime enable/disable interface if
> there is concensus.

There would have to be very good reason for a completely separate 
control. Bypassing MADV_NOHUGEPAGE or "madvise=never" simply because we 
add a "flexible" before the THP sounds broken.

-- 
Cheers,

David / dhildenb




More information about the linux-arm-kernel mailing list