[RFC PATCH 8/8] HACK: mm: memory_hotplug: Drop memblock_phys_free() call in try_remove_memory()

David Hildenbrand david at redhat.com
Fri May 31 02:55:08 PDT 2024


On 31.05.24 11:48, Jonathan Cameron wrote:
> On Fri, 31 May 2024 09:49:32 +0200
> David Hildenbrand <david at redhat.com> wrote:
> 
>> On 29.05.24 19:12, Jonathan Cameron wrote:
>>> I'm not sure what this is balancing, but it if is necessary then the reserved
>>> memblock approach can't be used to stash NUMA node assignments as after the
>>> first add / remove cycle the entry is dropped so not available if memory is
>>> re-added at the same HPA.
>>>
>>> This patch is here to hopefully spur comments on what this is there for!
>>>
>>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron at huawei.com>
>>> ---
>>>    mm/memory_hotplug.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 431b1f6753c0..3d8dd4749dfc 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -2284,7 +2284,7 @@ static int __ref try_remove_memory(u64 start, u64 size)
>>>    	}
>>>    
>>>    	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
>>> -		memblock_phys_free(start, size);
>>> +		//		memblock_phys_free(start, size);
>>>    		memblock_remove(start, size);
>>>    	}
>>
>> memblock_phys_free() works on memblock.reserved, memblock_remove() works
>>    on memblock.memory.
>>
>> If you take a look at the doc at the top of memblock.c:
>>
>> memblock.memory: physical memory available to the system
>> memblock.reserved: regions that were allocated [during boot]
>>
>>
>> memblock.memory is supposed to be a superset of memblock.reserved. Your
>> "hack" here indicates that you somehow would be relying on the opposite
>> being true, which indicates that you are doing the wrong thing.
>>
>>
>> memblock_remove() indeed balances against memblock_add_node() for
>> hotplugged memory [add_memory_resource()]. There seem to a case where we
>> would succeed in hotunplugging memory that was part of "memblock.reserved".
>>
>> But how could that happen? I think the following way:
>>
>> Once the buddy is up and running, memory allocated during early boot is
>> not freed back to memblock, but usually we simply go via something like
>> free_reserved_page(), not memblock_free() [because the buddy took over].
>> So one could end up unplugging memory that still resides in
>> memblock.reserved set.
>>
>> So with memblock_phys_free(), we are enforcing the invariant that
>> memblock.memory is a superset of memblock.reserved.
>>
>> Likely, arm64 should store that node assignment elsewhere from where it
>> can be queried. Or it should be using something like
>> CONFIG_HAVE_MEMBLOCK_PHYS_MAP for these static windows.
>>
> 
> Hi David,
> 
> Thanks for the explanation and pointers.  I'd rather avoid inventing a parallel
> infrastructure for this (like x86 has for other reasons, but which is also used
> for this purpose).

Yes, although memblock feels a bit wrong, because it is targeted at 
managing actual present memory, not properties about memory that could 
become present later.

>  From a quick look CONFIG_HAVE_MEMBLOCK_PHYS_MAP is documented in a fashion that
> makes me think it's not directly appropriate (this isn't actual physical memory
> available during boot) but the general approach of just adding another memblock
> collection seems like it will work.

Yes. As an alternative, modify the description of 
CONFIG_HAVE_MEMBLOCK_PHYS_MAP.

> 
> Hardest problem might be naming it.  physmem_possible perhaps?
> Fill that with anything found in SRAT or CEDT should work for ACPI, but I'm not
> sure on whether we can fill it when neither of those is present.  Maybe we just
> don't bother as for today's usecase CEDT needs to be present.

You likely only want something for these special windows, with these 
special semantics. That makes it hard.

"physmem_possible" is a bit too generic for my taste, promising 
semantics that might not hold true (we don't want all hotpluggable areas 
to show up there).

> 
> Maybe physmem_known_possible is the way to go. I'll give this approach a spin
> and see how it goes.

Goes into a better direction, or really "physmem_fixed_windows". Because 
also "physmem_known_possible" as a wrong smell to it.

-- 
Cheers,

David / dhildenb




More information about the linux-arm-kernel mailing list