[RFC PATCH 8/8] HACK: mm: memory_hotplug: Drop memblock_phys_free() call in try_remove_memory()
David Hildenbrand
david at redhat.com
Wed Jun 5 01:23:33 PDT 2024
On 05.06.24 10:00, Mike Rapoport wrote:
> On Tue, Jun 04, 2024 at 11:39:27AM +0200, David Hildenbrand wrote:
>> On 04.06.24 11:35, Mike Rapoport wrote:
>>> On Mon, Jun 03, 2024 at 10:53:03PM +0200, David Hildenbrand wrote:
>>>> On 03.06.24 12:43, Mike Rapoport wrote:
>>>>> On Mon, Jun 03, 2024 at 11:14:00AM +0200, David Hildenbrand wrote:
>>>>
>>>>> The commit that added memblock_free() at the first place (f9126ab9241f
>>>>> ("memory-hotplug: fix wrong edge when hot add a new node")) does not really
>>>>> describe why that was required :(
>>>>>
>>>>> But at a quick glance it looks completely spurious.
>>>>
>>>> There are more details [1] but I also did not figure out why the
>>>> memblock_free() was really required to resolve that issue.
>>>>
>>>> [1] https://marc.info/?l=linux-kernel&m=142961156129456&w=2
>>> The tinkering with memblock there and in f9126ab9241f seem bogus in the
>>> context of memory hotplug on x86.
>>>
>>> I believe that dropping that memblock_phys_free() is right thing to do
>>> regardless of this series. There's no corresponding memblock_alloc() and it
>>> was added as part of a fix for hotunplug on x86 that anyway had memblock
>>> discarded at that point.
>>
>> So when we re-add that memory, we might have still ranges as "reserved".
>
> I don't see how anything can become reserved on the hotplug path unless
> hotplug is possible before mm_core_init().
> There are no memblock_reserve() calls in memory_hotplug.c, no memblock
> allocations possible after mm is inited, and even if memblock_add() will
> need to allocate memory that will be done via slab.
I had the following in mind:
(1) DIMM is part of boot mem and some boot allocation ended up on it
(2) That boot allocation got freed after the buddy is already up
(memblock.reserved not updated)
(3) We succeed in offlining the memory and unplugging the DIMM
Now we have some "reserved" leftover from memory that is no longer
physically around.
(4) We re-plug a DIMM at that position, possibly with a different NUMA
assignment.
On bare metal, this is unlikely to happen. With current QEMU, it won't
happen because (hotunpluggable) DIMMs are usually not part of bootmem;
that is, e820 and friends only indicate it as "hotpluggable but not
present memory" range. It could be possible after kexec (for example, we
add that memory to the e820 of the new kernel), but that's rather a
corner case.
(3) is already unlikely to happen, so removing that memblock_phys_free()
probably won't change anything in practice.
>
>> It does sound weird, but you're the boss :)
>
> Nah, it's mm/memory_hotplug.c, so you are :)
>
Well yes :) but it's your decision whether we want to use
memblock.reserved memory to store this persistent NUMA node assignment
for the fixed memory windows. Essentially another user of
memblock.reserved we should then document ;)
Removing the memblock_phys_free() call sounds like being a requirement
for that (although it might make sense independently) use case.
I'm not sure whether memblock.reserved is the right datastructure for
this purpose, but if you think it is, Jonathan would have green light on
the general approach in this RFC.
--
Cheers,
David / dhildenb
More information about the linux-arm-kernel
mailing list