[PATCH v3] mm/hugetlb: split hugetlb_cma in nodes with memory

Wed Jul 15 04:18:22 EDT 2020

Hi Mike,

On Tue, Jul 14, 2020 at 04:21:01PM -0700, Mike Kravetz wrote:
> I agree we should only be concerned with N_MEMORY nodes for the CMA
> reservations.  However, this patch got me thinking:
> - Do we really have to initiate the CMA reservations from arch specific code?
> - Can we move the call to reserve CMA a little later into hugetlb arch
>   independent code?
> 
> I know the cma_declare_contiguous_nid() routine says it should be called
> from arch specific code.  However, unless I am missing something that seems
> mostly about timing.
> 
> What about a change like this on top of this patch?
> 
> From 72b5b9a623f8711ad7f79f1a8f910906245f5d07 Mon Sep 17 00:00:00 2001
> From: Mike Kravetz <mike.kravetz at oracle.com>
> Date: Tue, 14 Jul 2020 15:54:46 -0700
> Subject: [PATCH] hugetlb: move cma allocation call to arch independent code
> 
> Instead of calling hugetlb_cma_reserve() from arch specific code,
> call from arch independent code when a gigantic page hstate is
> created.  This is late enough in the init process that all numa
> memory information should be initialized.  And, it is early enough
> to still use early memory allocator.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz at oracle.com>
> ---
>  arch/arm64/mm/init.c    | 10 ----------
>  arch/x86/kernel/setup.c |  9 ---------
>  mm/hugetlb.c            |  8 +++++++-
>  3 files changed, 7 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 79806732f4b4..ff0ff584dde9 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -427,16 +427,6 @@ void __init bootmem_init(void)
>  	sparse_init();
>  	zone_sizes_init(min, max);
>  
> -	/*
> -	 * must be done after zone_sizes_init() which calls free_area_init()
> -	 * that calls node_set_state() to initialize node_states[N_MEMORY]
> -	 * because hugetlb_cma_reserve() will scan over nodes with N_MEMORY
> -	 * state
> -	 */
> -#ifdef CONFIG_ARM64_4K_PAGES
> -	hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> -#endif
> -
>  	memblock_dump_all();
>  }
>  
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index a1a9712090ae..111c8467fafa 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1177,15 +1177,6 @@ void __init setup_arch(char **cmdline_p)
>  
>  	x86_init.paging.pagetable_init();
>  
> -	/*
> -	 * must be done after zone_sizes_init() which calls free_area_init()
> -	 * that calls node_set_state() to initialize node_states[N_MEMORY]
> -	 * because hugetlb_cma_reserve() will scan over nodes with N_MEMORY
> -	 * state
> -	 */
> -	if (boot_cpu_has(X86_FEATURE_GBPAGES))
> -		hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> -
>  	kasan_init();
>  
>  	/*
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f24acb3af741..a0007d1d12d2 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3273,6 +3273,9 @@ void __init hugetlb_add_hstate(unsigned int order)
>  	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
>  					huge_page_size(h)/1024);

(nit: you can also make hugetlb_cma_reserve() static and remote its function
prototypes from hugetlb.h)

> +	if (order >= MAX_ORDER && hugetlb_cma_size)
> +		hugetlb_cma_reserve(order);

Although I really like the idea of moving this out of the arch code, I don't
quite follow the check against MAX_ORDER here -- it looks like a bit of a
hack to try to intercept the "PUD_SHIFT - PAGE_SHIFT" order which we
currently pass to hugetlb_cma_reserve(). Maybe we could instead have
something like:

	#ifndef HUGETLB_CMA_ORDER
	#define HUGETLB_CMA_ORDER	(PUD_SHIFT - PAGE_SHIFT)
	#endif

and then just do:

	if (order == HUGETLB_CMA_ORDER)
		hugetlb_cma_reserve(order);

? Is there something else I'm missing?

> +
>  	parsed_hstate = h;
>  }
>  
> @@ -5647,7 +5650,10 @@ void __init hugetlb_cma_reserve(int order)
>  	unsigned long size, reserved, per_node;
>  	int nid;
>  
> -	cma_reserve_called = true;
> +	if (cma_reserve_called)
> +		return;
> +	else
> +		cma_reserve_called = true;

(nit: don't need the 'else' here)

Will