[PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
sudaraja at codeaurora.org
sudaraja at codeaurora.org
Thu Sep 10 16:48:40 EDT 2020
On 2020-09-10 03:50, Anshuman Khandual wrote:
> On 09/10/2020 01:57 PM, Steven Price wrote:
>> On 10/09/2020 07:05, Sudarshan Rajagopalan wrote:
>>> When section mappings are enabled, we allocate vmemmap pages from
>>> physically
>>> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf().
>>> Section
>>> mappings are good to reduce TLB pressure. But when system is highly
>>> fragmented
>>> and memory blocks are being hot-added at runtime, its possible that
>>> such
>>> physically continuous memory allocations can fail. Rather than
>>> failing the
>>> memory hot-add procedure, add a fallback option to allocate vmemmap
>>> pages from
>>> discontinuous pages using vmemmap_populate_basepages().
>>>
>>> Signed-off-by: Sudarshan Rajagopalan <sudaraja at codeaurora.org>
>>> Cc: Catalin Marinas <catalin.marinas at arm.com>
>>> Cc: Will Deacon <will at kernel.org>
>>> Cc: Anshuman Khandual <anshuman.khandual at arm.com>
>>> Cc: Mark Rutland <mark.rutland at arm.com>
>>> Cc: Logan Gunthorpe <logang at deltatee.com>
>>> Cc: David Hildenbrand <david at redhat.com>
>>> Cc: Andrew Morton <akpm at linux-foundation.org>
>>> Cc: Steven Price <steven.price at arm.com>
>>> ---
>>> arch/arm64/mm/mmu.c | 15 ++++++++++++---
>>> 1 file changed, 12 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 75df62f..a46c7d4 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long
>>> start, unsigned long end, int node,
>>> p4d_t *p4dp;
>>> pud_t *pudp;
>>> pmd_t *pmdp;
>>> + int ret = 0;
>>> do {
>>> next = pmd_addr_end(addr, end);
>>> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long
>>> start, unsigned long end, int node,
>>> void *p = NULL;
>>> p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
>>> - if (!p)
>>> - return -ENOMEM;
>>> + if (!p) {
>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>>> + vmemmap_free(start, end, altmap);
>>> +#endif
>>> + ret = -ENOMEM;
>>> + break;
>>> + }
>>> pmd_set_huge(pmdp, __pa(p),
>>> __pgprot(PROT_SECT_NORMAL));
>>> } else
>>> vmemmap_verify((pte_t *)pmdp, node, addr, next);
>>> } while (addr = next, addr != end);
>>> - return 0;
>>> + if (ret)
>>> + return vmemmap_populate_basepages(start, end, node, altmap);
>>> + else
>>> + return ret;
>>
>> Style comment: I find this usage of 'ret' confusing. When we assign
>> -ENOMEM above that is never actually the return value of the function
>> (in that case vmemmap_populate_basepages() provides the actual return
>> value).
>
> Right.
>
>>
>> Also the "return ret" is misleading since we know by that point that
>> ret==0 (and the 'else' is redundant).
>
> Right.
>
>>
>> Can you not just move the call to vmemmap_populate_basepages() up to
>> just after the (possible) vmemmap_free() call and remove the 'ret'
>> variable?
>>
Yes the usage of "return ret" is quite confusing and misleading here -
will clean this.
>> AFAICT the call to vmemmap_free() also doesn't need the #ifdef as the
>> function is a no-op if CONFIG_MEMORY_HOTPLUG isn't set. I also feel
>> you
>
> Right, CONFIG_MEMORY_HOTPLUG is not required.
Not quite exactly - the vmemmap_free() declaration in include/linux/mm.h
header file is wrapped around CONFIG_MEMORY_HOTPLUG as well. And since
the function definition is below the place where this is called, it will
throw an implicit declaration compile error when CONFIG_MEMORY_HOTPLUG
is not enabled. We can move the function definition above so that we
don't have to place this #ifdef. But we can go with 1st approach that
Anshuman mentions below.
>
> need at least a comment to explain Anshuman's point that it looks like
> you're freeing an unmapped area. Although if I'm reading the code
> correctly it seems like the unmapped area will just be skipped.
> Proposed vmemmap_free() attempts to free the entire requested vmemmap
> range
> [start, end] when an intermediate PMD entry can not be allocated. Hence
> even
> if vmemap_free() could skip an unmapped area (will double check on
> that), it
> unnecessarily goes through large sections of unmapped range, which
> could not
> have been mapped.
>
> So, basically there could be two different methods for doing this
> fallback.
>
> 1. Call vmemmap_populate_basepages() for sections when PMD_SIZE
> allocation fails
>
> - vmemmap_free() need not be called
>
> 2. Abort at the first instance of PMD_SIZE allocation failure
>
> - Call vmemmap_free() to unmap all sections mapped till that point
> - Call vmemmap_populate_basepages() to map the entire request section
>
> The proposed patch tried to mix both approaches. Regardless, the first
> approach
> here seems better and is the case in vmemmap_populate_hugepages()
> implementation
> on x86 as well.
The 1st approach looks more cleaner compared to bailing out in first
failure, unmapping all previously mapped sections and map entire request
with vmemmap_populate_basepages. Thanks for the review and suggestion -
will send over a cleaner patch soon.
Sudarshan
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
Linux Foundation Collaborative Project
More information about the linux-arm-kernel
mailing list