[PATCH 10/12] kho: extended scratch
Pratyush Yadav
pratyush at kernel.org
Mon May 18 10:04:53 PDT 2026
On Sun, May 17 2026, Mike Rapoport wrote:
> On Wed, Apr 29, 2026 at 03:39:12PM +0200, Pratyush Yadav wrote:
>> From: "Pratyush Yadav (Google)" <pratyush at kernel.org>
>>
>> Methodology
>> ===========
>>
>> Introduce extended scratch areas. These areas are discovered at boot by
>> walking the preserved memory radix tree and looking for free blocks of
>> memory. They then marked as scratch to allow allocations from them. This
>> makes KHO more resilient to memory pressure and allows supporting huge
>> page preservation.
>>
>> Since the preserved memory radix tree mixes both physical address and
>> order into a single key, and does not track table pages, it is difficult
>> to identify free areas from it directly. Walk the tree and digest it
>> down into another radix tree. The latter tracks blocks of
>> KHO_EXT_SHIFT (1 GiB as of now) granularity. Then walk the digested tree
>> and mark the areas between the present keys as scratch.
>>
>> Signed-off-by: Pratyush Yadav (Google) <pratyush at kernel.org>
>> ---
>> include/linux/kexec_handover.h | 1 +
>> kernel/liveupdate/kexec_handover.c | 148 +++++++++++++++++++++++++----
>> mm/mm_init.c | 1 +
>> 3 files changed, 133 insertions(+), 17 deletions(-)
>>
>> diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
>> index 1a04e089f779..c2b843a5fb28 100644
>> --- a/kernel/liveupdate/kexec_handover.c
>> +++ b/kernel/liveupdate/kexec_handover.c
>> @@ -840,6 +857,120 @@ static void __init kho_reserve_scratch(void)
>> kho_enable = false;
>> }
>>
>> +#define KHO_EXT_SHIFT 30 /* 1 GiB */
>
> arm64 does not necessarily use 1G gigantic pages and worse, it can have 2
> gigantic hstates.
>
> I think this should take into account what actual gigantic page sizes are
> in use for the general case.
This has nothing to with the gigantic page sizes. This is simply the
granularity at which KHO looks for free blocks. Making this larger means
less memory usage and better performance at the cost of amount of memory
recovered. Making this smaller does the opposite.
I picked 1G because it "feels" the right balance. Mostly gut feeling
without real science behind the number. I can make it smaller or larger
if you'd like.
--
Regards,
Pratyush Yadav
More information about the kexec
mailing list