[PATCH v3] mm/hugetlb: split hugetlb_cma in nodes with memory
Mike Kravetz
mike.kravetz at oracle.com
Mon Jul 20 14:17:31 EDT 2020
On 7/19/20 11:22 PM, Anshuman Khandual wrote:
>
>
> On 07/17/2020 10:32 PM, Mike Kravetz wrote:
>> On 7/16/20 10:02 PM, Anshuman Khandual wrote:
>>>
>>>
>>> On 07/16/2020 11:55 PM, Mike Kravetz wrote:
>>>> >From 17c8f37afbf42fe7412e6eebb3619c6e0b7e1c3c Mon Sep 17 00:00:00 2001
>>>> From: Mike Kravetz <mike.kravetz at oracle.com>
>>>> Date: Tue, 14 Jul 2020 15:54:46 -0700
>>>> Subject: [PATCH] hugetlb: move cma reservation to code setting up gigantic
>>>> hstate
>>>>
>>>> Instead of calling hugetlb_cma_reserve() directly from arch specific
>>>> code, call from hugetlb_add_hstate when adding a gigantic hstate.
>>>> hugetlb_add_hstate is either called from arch specific huge page setup,
>>>> or as the result of hugetlb command line processing. In either case,
>>>> this is late enough in the init process that all numa memory information
>>>> should be initialized. And, it is early enough to still use early
>>>> memory allocator.
>>>
>>> This assumes that hugetlb_add_hstate() is called from the arch code at
>>> the right point in time for the generic HugeTLB to do the required CMA
>>> reservation which is not ideal. I guess it must have been a reason why
>>> CMA reservation should always called by the platform code which knows
>>> the boot sequence timing better.
>>
>> Actually, the code does not make the assumption that hugetlb_add_hstate
>> is called from arch specific huge page setup. It can even be called later
>> at the time of hugetlb command line processing.
>
> Yes, now that hugetlb_cma_reserve() has been moved into hugetlb_add_hstate().
> But then there is an explicit warning while trying to mix both the command
> line options i.e hugepagesz= and hugetlb_cma=. The proposed code here have
> not changed that behavior and hence the following warning should have been
> triggered here as well.
>
> 1) hugepagesz_setup()
> hugetlb_add_hstate()
> hugetlb_cma_reserve()
>
> 2) hugepages_setup()
> hugetlb_hstate_alloc_pages() when order >= MAX_ORDER
>
> if (hstate_is_gigantic(h)) {
> if (IS_ENABLED(CONFIG_CMA) && hugetlb_cma[0]) {
> pr_warn_once("HugeTLB: hugetlb_cma is enabled, skip boot time allocation\n");
> break;
> }
> if (!alloc_bootmem_huge_page(h))
> break;
> }
>
> Nonetheless, it does not make sense to mix both memblock and CMA based huge
> page pre-allocations. But looking at this again, could this warning be ever
> triggered till now ? Unless, a given platform calls hugetlb_cma_reserve()
> before _setup("hugepages=", hugepages_setup). Anyways, there seems to be
> good reasons to keep both memblock and CMA based pre-allocations in place.
> But mixing them together (as done in the proposed code here) does not seem
> to be right.
I'm not sure if I follow the question.
This proposal does not change the trigger for the warning printed when one
tries to both reserve CMA and pre-allocate gigantic pages. If hugetlb_cma
is specified on the command line, and someone tries to pre-allocate gigantic
pages they will get the warning. Such a command line on x86 might look like,
hugetlb_cma=4G hugepagesz=1G hugepages=4
You will then see,
[ 0.065864] HugeTLB: hugetlb_cma is enabled, skip boot time allocation
[ 0.065866] HugeTLB: allocating 4 of page size 1.00 GiB failed. Only allocated 0 hugepages.
Ideally we could/should eliminate the second message.
This behavior exists in the current code.
>> My 'reasoning' is that gigantic pages can currently be preallocated from
>> bootmem/memblock_alloc at the time of command line processing. Therefore,
>> we should be able to reserve bootmem for CMA at the same time. Is there
>> something wrong with this reasoning? I tested this on x86 by removing the
>> call to hugetlb_add_hstate from arch specific code and instead forced the
>> call at command line processing time. The ability to reserve CMA was the
>> same.
>
> There is no problem with that reasoning. __setup() triggered function should
> be able perform CMA reservation. But as pointed out before, it does not make
> sense to mix both CMA reservation and memblock based pre-allocation.
Agree. I am not proposing we do. Sorry, if you got that impression.
>> Yes, the CMA reservation interface says it should be called from arch
>> specific code. However, if we currently depend on the ability to do
>> memblock_alloc at hugetlb command line processing time for gigantic page
>> preallocation, then I think we can do the CMA reservation here as well.
>
> IIUC, CMA reservation and memblock alloc have some differences in terms of
> how the memory can be used later on, will have to dig deeper on this. But
> the comment section near cma_declare_contiguous_nid() is a concern.
>
> * This function reserves memory from early allocator. It should be
> * called by arch specific code once the early allocator (memblock or bootmem)
> * has been activated and all other subsystems have already allocated/reserved
> * memory. This function allows to create custom reserved areas.
>
Yes, that is the comment I was looking at as well.
However, note that hugetlb pre-allocation of gigantic pages will end up
calling memblock_alloc_range_nid. This is the same routine used for CMA
reservations/allocations from cma_declare_contiguous_nid. This is why
there should be no issue with doing CMA reservations at this time.
This may be the confusing part. I am not saying we would do CMA reservations
and pre-allocations together. Rather, they both rely on the underlying code so
we can call them at the same time in the init process.
>> Thinking about it some more, I suppose there could be some arch code that
>> could call hugetlb_add_hstate too early in the boot process. But, I do
>> not think we have an issue with calling it too late.
>>
>
> Calling it too late might have got the page allocator initialized completely
> and then CMA reservation would not be possible afterwards. Also calling it
> too early would prevent other subsystems which might need memory reservation
> in specific physical ranges.
I thought about it some more and came up with a way to do all this at command
line processing time. It will take me a day or two to put together.
The patch from Barry which started this thread is indeed needed and is in
Andrew's tree. I'll start another thread with a patch to move CMA reservations
to command line processing.
--
Mike Kravetz
More information about the linux-arm-kernel
mailing list