[PATCH v1 3/4] memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED

David Hildenbrand david at redhat.com
Fri Oct 1 01:04:24 PDT 2021


On 30.09.21 23:21, Mike Rapoport wrote:
> On Wed, Sep 29, 2021 at 06:54:01PM +0200, David Hildenbrand wrote:
>> On 29.09.21 18:39, Mike Rapoport wrote:
>>> Hi,
>>>
>>> On Mon, Sep 27, 2021 at 05:05:17PM +0200, David Hildenbrand wrote:
>>>> Let's add a flag that corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED.
>>>> Similar to MEMBLOCK_HOTPLUG, most infrastructure has to treat such memory
>>>> like ordinary MEMBLOCK_NONE memory -- for example, when selecting memory
>>>> regions to add to the vmcore for dumping in the crashkernel via
>>>> for_each_mem_range().
>>> Can you please elaborate on the difference in semantics of MEMBLOCK_HOTPLUG
>>> and MEMBLOCK_DRIVER_MANAGED?
>>> Unless I'm missing something they both mark memory that can be unplugged
>>> anytime and so it should not be used in certain cases. Why is there a need
>>> for a new flag?
>>
>> In the cover letter I have "Alternative B: Reuse MEMBLOCK_HOTPLUG.
>> MEMBLOCK_HOTPLUG serves a different purpose, though.", but looking into the
>> details it won't work as is.
>>
>> MEMBLOCK_HOTPLUG is used to mark memory early during boot that can later get
>> hotunplugged again and should be placed into ZONE_MOVABLE if the
>> "movable_node" kernel parameter is set.
>>
>> The confusing part is that we talk about "hotpluggable" but really mean
>> "hotunpluggable": the reason is that HW flags DIMM slots that can later be
>> hotplugged as "hotpluggable" even though there is already something
>> hotplugged.
> 
> MEMBLOCK_HOTPLUG name is indeed somewhat confusing, but still it's core
> meaning "this memory may be removed" which does not differ from what
> IORESOURCE_SYSRAM_DRIVER_MANAGED means.
> 
> MEMBLOCK_HOTPLUG regions are indeed placed into ZONE_MOVABLE, but more
> importantly, they are avoided when we allocate memory from memblock.
> 
> So, in my view, both flags mean that the memory may be removed and it
> should not be used for certain types of allocations.

The semantics are different:

MEMBLOCK_HOTPLUG: memory is indicated as "System RAM" in the 
firmware-provided memory map and added to the system early during boot; 
we want this memory to be managed by ZONE_MOVABLE with "movable_node" 
set on the kernel command line, because only then we want it to be 
hotpluggable again. kexec *has to* indicate this memory to the second 
kernel and can place kexec-images on this memory. After memory 
hotunplug, kexec has to be re-armed.

MEMBLOCK_DRIVER_MANAGED: memory is not indicated as System RAM" in the 
firmware-provided memory map; this memory is always detected and added 
to the system by a driver; memory might not actually be physically 
hotunpluggable and the ZONE selection does not depend on "movable_core". 
kexec *must not* indicate this memory to the second kernel and *must 
not* place kexec-images on this memory.


I would really advise against mixing concepts here.


What we could do is indicate *all* hotplugged memory (not just 
IORESOURCE_SYSRAM_DRIVER_MANAGED memory) as MEMBLOCK_HOTPLUG and make 
MEMBLOCK_HOTPLUG less dependent on "movable_node".

MEMBLOCK_HOTPLUG for early boot memory: with "movable_core", place it in 
ZONE_MOVABLE. Even without "movable_core", don't place early kernel 
allocations on this memory.
MEMBLOCK_HOTPLUG for all memory: don't place kexec images or on this 
memory, independent of "movable_core".


memblock would then not contain the information "contained in 
firmware-provided memory map" vs. "not contained in firmware-provided 
memory map"; but I think right now it's not strictly required to have 
that information if we'd go down that path.

>   
>> For example, ranges in the ACPI SRAT that are marked as
>> ACPI_SRAT_MEM_HOT_PLUGGABLE will be marked MEMBLOCK_HOTPLUG early during
>> boot (drivers/acpi/numa/srat.c:acpi_numa_memory_affinity_init()). Later, we
>> use that information to size ZONE_MOVABLE
>> (mm/page_alloc.c:find_zone_movable_pfns_for_nodes()). This will make sure
>> that these "hotpluggable" DIMMs can later get hotunplugged.
>>
>> Also, see should_skip_region() how this relates to the "movable_node" kernel
>> parameter:
>>
>> 	/* skip hotpluggable memory regions if needed */
>> 	if (movable_node_is_enabled() && memblock_is_hotpluggable(m) &&
>> 	    (flags & MEMBLOCK_HOTPLUG))
>> 		return true;
> 
> Hmm, I think that the movable_node_is_enabled() check here is excessive,
> but I suspect we cannot simply remove it without breaking anything.

The reasoning is: without "movable_core" we don't want this memory to be 
hotunpluggable; consequently, we don't care if we place kexec-images on 
this memory. MEMBLOCK_HOTPLUG is currently only active with "movable_core".

If we remove that check, we will always not place early kernel 
allocations on that memory, even if we don't care about ZONE_MOVABLE.

> 
> I'll take a deeper look on the potential consequences.
> 
> BTW, is there anything that prevents putting kexec to hot-unplugable memory
> that was cold-plugged on boot?

I think it depends on how the platform handles hotunpluggable DIMMs or 
hotunpluggable NUMA nodes. If the platform ends up indicates such memory 
via MEMBLOCK_HOTPLUG, and "movable_core" is set, memory would be put 
into ZONE_MOVABLE and kexec would not place kexec-images on that memory.

-- 
Thanks,

David / dhildenb




More information about the kexec mailing list