[PATCH v12 7/7] x86/crash: Add x86 crash hotplug support

Fri Oct 28 08:29:45 PDT 2022

On 10/28/22 05:19, Borislav Petkov wrote:
> On Thu, Oct 27, 2022 at 02:24:11PM -0500, Eric DeVolder wrote:
>> Be aware, in reality, that if the system was fully populated, it would not
>> actually consume all 8192 phdrs. Rather /proc/iomem would essentially show a
>> large contiguous address space which would require just a single phdr.
> 
> Then that from below:
> 
> 	pnum += CONFIG_CRASH_MAX_MEMORY_RANGES;
> 
> which then would end up allocating 8192 would be a total waste.
> 
> So why don't you make that number dynamic then?
> 
> You start with something sensible:
> 
> 	total_num_pheaders = num_online_cpus() + "some number of regions" + "some few others"
> 
> I.e., a number which is a good compromise on the majority of machines.
> 
> Then, on hotplug events you count how many new regions are coming in
> and when you reach the total_num_pheaders number, you double it (or some
> other increase stragegy), reallocate the ELF header buffers etc needed
> for kdump and you're good.
> 
> This way, you don't waste memory unnecessarily on the majority of
> systems and those who need more, get to allocate more.

This patch series sizes and allocates the memory buffer/segment for the elfcorehdr once, at kdump 
load time.

In order to dynamically resize the elcorehdr memory buffer/segment, that causes the following ripple 
effects:

  - Permitting resizing of the elfcorehdr requires a means to "allocate" a new size buffer from 
within the crash kernel reserved area. There is no allocator today; currently it is a kind of 
one-pass placement process that happens at load time. The critical side effect of allocating a new 
elfcorehdr buffer memory/segment is that it creates a new address for the elfcorehdr.

  - The elfcorehdr is passed to the crash kernel via the elfcorehdr= kernel cmdline option. As such, 
a dynamic change to the size of the elfcorehdr size necessarily invites a change of address of that 
buffer, and therefore a change to rewrite the crash kernel cmdline to reflect the new elfcorehdr 
buffer address.

  - A change to the cmdline, also invites a possible change of address of the buffer containing the 
cmdline, and thus a change to the x86 boot_params, which contains the cmdline pointer.

  - A change to the cmdline and/or boot_params, which are *not* excluded from the hash/digest, means 
that the purgatory hash/digest needs to be recomputed, and purgatory re-linked with the new 
hash/digest and replaced.

A fair amount of work, but I have had this working in the past, around the v1 patch series 
timeframe. However, it only worked for the kexec_file_load() syscall as all the needed pieces of 
information were available; but for kexec_load(), it isn't possible to relink purgatory as by that 
point purgatory is but a user-space binary blob.

It was feedback on the v1/v2 that pointed out that by excluding the elfcorehdr from the hash/digest, 
the "change of address" problem with the elfcorehdr buffer/segment goes away, and, in turn, negates 
the need to: introduce an allocator for the crash kernel reserved space, rewrite the crash kernel 
cmdline with a new elfcorehdr, update boot_params with a new cmdline and re-link and replace 
purgatory with the updated digest. And it enables this hotplug efforts to support kexec_load() 
syscall as well.

So it is with this in mind that I suggest we stay with the statically sized elfcorehdr buffer.

If that can be agreed upon, then it is "just a matter" of picking a useful elfcorehdr size. 
Currently that size is derived from the NR_DEFAULT_CPUS and CRASH_MAX_MEMORY_RANGES. So, there is 
still the CRASH_MAX_MEMORY_RANGES knob to help a dial in size, should there be some issue with the 
default value/size.

Or if there is desire to drop computing the size from NR_DEFAULT_CPUs and CRASH_MAX_MEMORY_RANGES 
and simply go with CRASH_HOTPLUG_ELFCOREHDR_SZ which simply specifies the buffer size, then I'm also 
good with that.

I still owe a much better explanation of how to size the elfcorehdr. I can use the comments and 
ideas from the discussion to provide the necessary insight when choosing this value, whether that be 
CRASH_MAX_MEMORY_RANGES or CRASH_HOTPLUG_ELFCOREHDR_SZ.

> 
>> I'd prefer keeping CRASH_MAX_MEMORY_RANGES as that allow the maximum phdr
>> number value to be reflective of CPUs and/or memory; not all systems support
>> both CPU and memory hotplug. For example, I have queued up this change to
>> reflect this:
>>
>>      if (IS_ENABLED(CONFIG_HOTPLUG_CPU) || IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) {
> 
> If you're going to keep CRASH_MAX_MEMORY_RANGES, then you can test only
> that thing as it expresses the dependency on CONFIG_HOTPLUG_CPU and
> CONFIG_MEMORY_HOTPLUG already.
> 
> If you end up making the number dynamic, then you could make that a
> different Kconfig item which contains all that crash code as most of the
> people won't need it anyway.

It is my intention to correct the CRASH_MAX_MEMORY_RANGES (if we keep it) as such:

config CRASH_MAX_MEMORY_RANGES
     depends on CRASH_DUMP && KEXEC_FILE && MEMORY_HOTPLUG

CRASH_MAX_MEMORY_RANGES should have never had CPU_HOTPLUG as a dependency; that was a cut-n-paste 
error on my part.

> 
> Hmm?
> 

Thank you for the time and thought on this topic!
eric