[PATCH v8 03/10] mm: thp: Introduce multi-size THP sysfs interface

David Hildenbrand david at redhat.com
Tue Dec 5 01:57:41 PST 2023


On 05.12.23 10:50, Ryan Roberts wrote:
> On 05/12/2023 04:21, Barry Song wrote:
>> On Mon, Dec 4, 2023 at 11:21 PM Ryan Roberts <ryan.roberts at arm.com> wrote:
>>>
>>> In preparation for adding support for anonymous multi-size THP,
>>> introduce new sysfs structure that will be used to control the new
>>> behaviours. A new directory is added under transparent_hugepage for each
>>> supported THP size, and contains an `enabled` file, which can be set to
>>> "inherit" (to inherit the global setting), "always", "madvise" or
>>> "never". For now, the kernel still only supports PMD-sized anonymous
>>> THP, so only 1 directory is populated.
>>>
>>> The first half of the change converts transhuge_vma_suitable() and
>>> hugepage_vma_check() so that they take a bitfield of orders for which
>>> the user wants to determine support, and the functions filter out all
>>> the orders that can't be supported, given the current sysfs
>>> configuration and the VMA dimensions. If there is only 1 order set in
>>> the input then the output can continue to be treated like a boolean;
>>> this is the case for most call sites. The resulting functions are
>>> renamed to thp_vma_suitable_orders() and thp_vma_allowable_orders()
>>> respectively.
>>>
>>> The second half of the change implements the new sysfs interface. It has
>>> been done so that each supported THP size has a `struct thpsize`, which
>>> describes the relevant metadata and is itself a kobject. This is pretty
>>> minimal for now, but should make it easy to add new per-thpsize files to
>>> the interface if needed in future (e.g. per-size defrag). Rather than
>>> keep the `enabled` state directly in the struct thpsize, I've elected to
>>> directly encode it into huge_anon_orders_[always|madvise|inherit]
>>> bitfields since this reduces the amount of work required in
>>> thp_vma_allowable_orders() which is called for every page fault.
>>>
>>> See Documentation/admin-guide/mm/transhuge.rst, as modified by this
>>> commit, for details of how the new sysfs interface works.
>>>
>>> Signed-off-by: Ryan Roberts <ryan.roberts at arm.com>
>>
>> Reviewed-by: Barry Song <v-songbaohua at oppo.com>
> 
> Thanks!
> 
>>
>>> -khugepaged will be automatically started when
>>> -transparent_hugepage/enabled is set to "always" or "madvise, and it'll
>>> -be automatically shutdown if it's set to "never".
>>> +khugepaged will be automatically started when one or more hugepage
>>> +sizes are enabled (either by directly setting "always" or "madvise",
>>> +or by setting "inherit" while the top-level enabled is set to "always"
>>> +or "madvise"), and it'll be automatically shutdown when the last
>>> +hugepage size is disabled (either by directly setting "never", or by
>>> +setting "inherit" while the top-level enabled is set to "never").
>>>
>>>   Khugepaged controls
>>>   -------------------
>>>
>>> +.. note::
>>> +   khugepaged currently only searches for opportunities to collapse to
>>> +   PMD-sized THP and no attempt is made to collapse to other THP
>>> +   sizes.
>>
>> For small-size THP, collapse is probably a bad idea. we like a one-shot
>> try in Android especially we are using a 64KB and less large folio size. if
>> PF succeeds in getting large folios, we map large folios, otherwise we
>> give up as those memories can be quite unstably swapped-out, swapped-in
>> and madvised to be DONTNEED.
>>
>> too many compactions will increase power consumption and decrease UI
>> response.
> 
> Understood; that's very useful information for the Android context. Multiple
> people have made comments about eventually needing khugepaged (or something
> similar) support in the server context though to async collapse to contpte size.
> Actually one suggestion was a user space daemon that scans and collapses with
> MADV_COLLAPSE. I suspect the key will be to ensure whatever solution we go for
> is flexible and can be enabled/disabled/configured for the different environments.

There certainly is interest for 2 MiB THP on arm64 64k where the THP 
size would normally be 512 MiB. In that scenario, khugepaged makes 
perfect sense.

-- 
Cheers,

David / dhildenb




More information about the linux-arm-kernel mailing list