[PATCH v3] mm/hugetlb: split hugetlb_cma in nodes with memory

Wed Jul 15 12:59:24 EDT 2020

On 7/15/20 1:18 AM, Will Deacon wrote:
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index f24acb3af741..a0007d1d12d2 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -3273,6 +3273,9 @@ void __init hugetlb_add_hstate(unsigned int order)
>>  	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
>>  					huge_page_size(h)/1024);
> 
> (nit: you can also make hugetlb_cma_reserve() static and remote its function
> prototypes from hugetlb.h)

Yes thanks.  I threw this together pretty quickly.

> 
>> +	if (order >= MAX_ORDER && hugetlb_cma_size)
>> +		hugetlb_cma_reserve(order);
> 
> Although I really like the idea of moving this out of the arch code, I don't
> quite follow the check against MAX_ORDER here -- it looks like a bit of a
> hack to try to intercept the "PUD_SHIFT - PAGE_SHIFT" order which we
> currently pass to hugetlb_cma_reserve(). Maybe we could instead have
> something like:
> 
> 	#ifndef HUGETLB_CMA_ORDER
> 	#define HUGETLB_CMA_ORDER	(PUD_SHIFT - PAGE_SHIFT)
> 	#endif
> 
> and then just do:
> 
> 	if (order == HUGETLB_CMA_ORDER)
> 		hugetlb_cma_reserve(order);
> 
> ? Is there something else I'm missing?
> 

Well, the current hugetlb CMA code only kicks in for gigantic pages as
defined by the hugetlb code. For example, the code to allocate a page
from CMA is in the routine alloc_gigantic_page().  alloc_gigantic_page()
is called from alloc_fresh_huge_page() which starts with:

        if (hstate_is_gigantic(h))
                page = alloc_gigantic_page(h, gfp_mask, nid, nmask);
        else
                page = alloc_buddy_huge_page(h, gfp_mask,
                                nid, nmask, node_alloc_noretry);

and, hstate_is_gigantic is,

static inline bool hstate_is_gigantic(struct hstate *h)
{
        return huge_page_order(h) >= MAX_ORDER;
}

So, everything in the existing code really depends on the hugetlb definition
of gigantic page (order >= MAX_ORDER).  The code to check for
'order >= MAX_ORDER' in my proposed patch is just following the same
convention.

I think the current dependency on the hugetlb definition of gigantic page
may be too simplistic if using CMA for huegtlb pages becomes more common.
Some architectures (sparc, powerpc) have more than one gigantic pages size.
Currently there is no way to specify that CMA should be used for one and
not the other.  In addition, I could imagine someone wanting to reserve/use
CMA for non-gigantic (PMD) sized pages.  There is no mechainsm for that today.

I honestly have not heard about many use cases for this CMA functionality.
When support was initially added, it was driven by a specific use case and
the 'all gigantic pages use CMA if defined' implementation was deemed
sufficient.  If there are more use cases, or this seems too simple we can
revisit that decision.

>> +
>>  	parsed_hstate = h;
>>  }
>>  
>> @@ -5647,7 +5650,10 @@ void __init hugetlb_cma_reserve(int order)
>>  	unsigned long size, reserved, per_node;
>>  	int nid;
>>  
>> -	cma_reserve_called = true;
>> +	if (cma_reserve_called)
>> +		return;
>> +	else
>> +		cma_reserve_called = true;
> 
> (nit: don't need the 'else' here)

Yes, duh!

-- 
Mike Kravetz