[PATCH v7 2/3] kho: fix deferred init of kho scratch

Wed Mar 18 08:45:25 PDT 2026

On Wed, Mar 18, 2026 at 4:26 PM Zi Yan <ziy at nvidia.com> wrote:
>
> On 18 Mar 2026, at 11:18, Michał Cłapiński wrote:
>
> > On Wed, Mar 18, 2026 at 4:10 PM Zi Yan <ziy at nvidia.com> wrote:
> >>
> >> On 17 Mar 2026, at 10:15, Michal Clapinski wrote:
> >>
> >>> Currently, if DEFERRED is enabled, kho_release_scratch will initialize
> >>> the struct pages and set migratetype of kho scratch. Unless the whole
> >>> scratch fit below first_deferred_pfn, some of that will be overwritten
> >>> either by deferred_init_pages or memmap_init_reserved_pages.
> >>>
> >>> To fix it, I modified kho_release_scratch to only set the migratetype
> >>> on already initialized pages. Then, modified init_pageblock_migratetype
> >>> to set the migratetype to CMA if the page is located inside scratch.
> >>>
> >>> Signed-off-by: Michal Clapinski <mclapinski at google.com>
> >>> ---
> >>>  include/linux/memblock.h           |  2 --
> >>>  kernel/liveupdate/kexec_handover.c | 10 ++++++----
> >>>  mm/memblock.c                      | 22 ----------------------
> >>>  mm/page_alloc.c                    |  7 +++++++
> >>>  4 files changed, 13 insertions(+), 28 deletions(-)
> >>>
> >>
> >> <snip>
> >>
> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>> index ee81f5c67c18..5ca078dde61d 100644
> >>> --- a/mm/page_alloc.c
> >>> +++ b/mm/page_alloc.c
> >>> @@ -55,6 +55,7 @@
> >>>  #include <linux/cacheinfo.h>
> >>>  #include <linux/pgalloc_tag.h>
> >>>  #include <linux/mmzone_lock.h>
> >>> +#include <linux/kexec_handover.h>
> >>>  #include <asm/div64.h>
> >>>  #include "internal.h"
> >>>  #include "shuffle.h"
> >>> @@ -549,6 +550,12 @@ void __meminit init_pageblock_migratetype(struct page *page,
> >>>                    migratetype < MIGRATE_PCPTYPES))
> >>>               migratetype = MIGRATE_UNMOVABLE;
> >>>
> >>> +     /*
> >>> +      * Mark KHO scratch as CMA so no unmovable allocations are made there.
> >>> +      */
> >>> +     if (unlikely(kho_scratch_overlap(page_to_phys(page), PAGE_SIZE)))
> >>> +             migratetype = MIGRATE_CMA;
> >>> +
> >>
> >> If this is only for deferred init code, why not put it in deferred_free_pages()?
> >> Otherwise, all init_pageblock_migratetype() callers need to pay the penalty
> >> of traversing kho_scratch array.
> >
> > Because reserve_bootmem_region() doesn't call deferred_free_pages().
> > So I would also have to modify it.
> >
> > And the early initialization won't pay the penalty of traversing the
> > kho_scratch array, since then kho_scratch is NULL.
>
> How about hugetlb_bootmem_init_migratetype(), init_cma_pageblock(),
> init_cma_reserved_pageblock(), __init_page_from_nid(), memmap_init_range(),
> __init_zone_device_page()?
>
> 1. are they having any PFN range overlapping with kho?
> 2. is kho_scratch NULL for them?
>
> 1 tells us whether putting code in init_pageblock_migratetype() could save
> the hassle of changing all above locations.
> 2 tells us how many callers are affected by traversing kho_scratch.

I could try answering those questions but

1. I'm new to this and I'm not sure how correct the answers will be.

2. If you're not using CONFIG_KEXEC_HANDOVER, the performance penalty
will be zero.
If you are using it, currently you have to disable
CONFIG_DEFERRED_STRUCT_PAGE_INIT and the performance hit from this is
far, far greater. This solution saves 0.5s on my setup (100GB of
memory). We can always improve the performance further in the future.

> Thanks.
>
> Best Regards,
> Yan, Zi