[RESEND PATCH v7 03/10] mm: thp: Introduce per-size thp sysfs interface

David Hildenbrand david at redhat.com
Wed Nov 29 00:05:31 PST 2023


On 29.11.23 04:42, John Hubbard wrote:
> On 11/22/23 08:29, Ryan Roberts wrote:
>> In preparation for adding support for anonymous small-sized THP,
>> introduce new sysfs structure that will be used to control the new
>> behaviours. A new directory is added under transparent_hugepage for each
>> supported THP size, and contains an `enabled` file, which can be set to
>> "global" (to inherrit the global setting), "always", "madvise" or
>> "never". For now, the kernel still only supports PMD-sized anonymous
>> THP, so only 1 directory is populated.
>>
>> The first half of the change converts transhuge_vma_suitable() and
>> hugepage_vma_check() so that they take a bitfield of orders for which
>> the user wants to determine support, and the functions filter out all
>> the orders that can't be supported, given the current sysfs
>> configuration and the VMA dimensions. If there is only 1 order set in
>> the input then the output can continue to be treated like a boolean;
>> this is the case for most call sites.
>>
>> The second half of the change implements the new sysfs interface. It has
>> been done so that each supported THP size has a `struct thpsize`, which
>> describes the relevant metadata and is itself a kobject. This is pretty
>> minimal for now, but should make it easy to add new per-thpsize files to
>> the interface if needed in future (e.g. per-size defrag). Rather than
>> keep the `enabled` state directly in the struct thpsize, I've elected to
>> directly encode it into huge_anon_orders_[always|madvise|global]
>> bitfields since this reduces the amount of work required in
>> transhuge_vma_suitable() which is called for every page fault.
>>
>> The remainder is copied from Documentation/admin-guide/mm/transhuge.rst,
>> as modified by this commit. See that file for further details.
>>
>> Transparent Hugepage Support for anonymous memory can be entirely
>> disabled (mostly for debugging purposes) or only enabled inside
>> MADV_HUGEPAGE regions (to avoid the risk of consuming more memory
>> resources) or enabled system wide. This can be achieved
>> per-supported-THP-size with one of::
>>
>> 	echo always >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> 	echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> 	echo never >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>>
>> where <size> is the hugepage size being addressed, the available sizes
>> for which vary by system. Alternatively it is possible to specify that
>> a given hugepage size will inherrit the global enabled setting::
>>
>> 	echo global >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>>
>> The global (legacy) enabled setting can be set as follows::
>>
>> 	echo always >/sys/kernel/mm/transparent_hugepage/enabled
>> 	echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
>> 	echo never >/sys/kernel/mm/transparent_hugepage/enabled
>>
>> By default, PMD-sized hugepages have enabled="global" and all other
>> hugepage sizes have enabled="never". If enabling multiple hugepage
>> sizes, the kernel will select the most appropriate enabled size for a
>> given allocation.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts at arm.com>
>> ---
>>    Documentation/admin-guide/mm/transhuge.rst |  74 ++++--
>>    Documentation/filesystems/proc.rst         |   6 +-
>>    fs/proc/task_mmu.c                         |   3 +-
>>    include/linux/huge_mm.h                    | 100 +++++---
>>    mm/huge_memory.c                           | 263 +++++++++++++++++++--
>>    mm/khugepaged.c                            |  16 +-
>>    mm/memory.c                                |   6 +-
>>    mm/page_vma_mapped.c                       |   3 +-
>>    8 files changed, 387 insertions(+), 84 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>> index b0cc8243e093..52565e0bd074 100644
>> --- a/Documentation/admin-guide/mm/transhuge.rst
>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>> @@ -45,10 +45,23 @@ components:
>>       the two is using hugepages just because of the fact the TLB miss is
>>       going to run faster.
>>
>> +As well as PMD-sized THP described above, it is also possible to
>> +configure the system to allocate "small-sized THP" to back anonymous
> 
> Here's one of the places to change to the new name, which lately is
> "multi-size THP", or mTHP or m_thp for short. (I've typed "multi-size"
> instead of "multi-sized", because the 'd' doesn't add significantly to
> the meaning, and if in doubt, shorter is better.
> 
> 
>> +memory (for example 16K, 32K, 64K, etc). These THPs continue to be
>> +PTE-mapped, but in many cases can still provide similar benefits to
>> +those outlined above: Page faults are significantly reduced (by a
>> +factor of e.g. 4, 8, 16, etc), but latency spikes are much less
>> +prominent because the size of each page isn't as huge as the PMD-sized
>> +variant and there is less memory to clear in each page fault. Some
>> +architectures also employ TLB compression mechanisms to squeeze more
>> +entries in when a set of PTEs are virtually and physically contiguous
>> +and approporiately aligned. In this case, TLB misses will occur less
>> +often.
>> +
> 
> OK, all of the above still seems like it can remain the same.
> 
>>    THP can be enabled system wide or restricted to certain tasks or even
>>    memory ranges inside task's address space. Unless THP is completely
>>    disabled, there is ``khugepaged`` daemon that scans memory and
>> -collapses sequences of basic pages into huge pages.
>> +collapses sequences of basic pages into PMD-sized huge pages.
>>
>>    The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>`
>>    interface and using madvise(2) and prctl(2) system calls.
>> @@ -95,12 +108,29 @@ Global THP controls
>>    Transparent Hugepage Support for anonymous memory can be entirely disabled
>>    (mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE
>>    regions (to avoid the risk of consuming more memory resources) or enabled
>> -system wide. This can be achieved with one of::
>> +system wide. This can be achieved per-supported-THP-size with one of::
>> +
>> +	echo always >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> +	echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> +	echo never >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> +
>> +where <size> is the hugepage size being addressed, the available sizes
>> +for which vary by system. Alternatively it is possible to specify that
>> +a given hugepage size will inherrit the global enabled setting::
> 
> typo: inherrit
> 
>> +
>> +	echo global >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> +
>> +The global (legacy) enabled setting can be set as follows::
>>
>>    	echo always >/sys/kernel/mm/transparent_hugepage/enabled
>>    	echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
>>    	echo never >/sys/kernel/mm/transparent_hugepage/enabled
>>
>> +By default, PMD-sized hugepages have enabled="global" and all other
>> +hugepage sizes have enabled="never". If enabling multiple hugepage
>> +sizes, the kernel will select the most appropriate enabled size for a
>> +given allocation.
>> +
> 
> This is slightly murky. I wonder if "inherited" is a little more directly
> informative than global; it certainly felt that way my first time running
> this and poking at it.
> 
> And a few trivial examples would be a nice touch.
> 
> And so overall with a few other minor tweaks, I'd suggest this:
> 
> ...
> where <size> is the hugepage size being addressed, the available sizes
> for which vary by system.
> 
> For example:
> 	echo always >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
> 
> Alternatively it is possible to specify that a given hugepage size will inherit
> the top-level "enabled" value:
> 
> 	echo inherited >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
> 
> For example:
> 	echo inherited >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
> 
> The top-level setting (for use with "inherited") can be by issuing one of the
> following commands::
> 
> 	echo always >/sys/kernel/mm/transparent_hugepage/enabled
> 	echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
> 	echo never >/sys/kernel/mm/transparent_hugepage/enabled
> 
> By default, PMD-sized hugepages have enabled="inherited" and all other
> hugepage sizes have enabled="never".

"inherited" works for me as well.

-- 
Cheers,

David / dhildenb




More information about the linux-arm-kernel mailing list