[PATCH v12 7/7] x86/crash: Add x86 crash hotplug support

Fri Sep 30 10:11:26 PDT 2022

On 9/30/22 11:50, Borislav Petkov wrote:
> On Fri, Sep 30, 2022 at 10:36:49AM -0500, Eric DeVolder wrote:
>>> Your help text talks about System RAM entries in /proc/iomem which means
>>> that those entries are present somewhere in the kernel and you can read
>>> them out and do the proper calculations dynamically instead of doing the
>>> static CONFIG_NR_CPUS_DEFAULT + CONFIG_CRASH_MAX_MEMORY_RANGES thing.
>>
>> The intent is to compute the max size buffer needed to contain a maximum
>> populated elfcorehdr, which is primarily based on the number of CPUs and
>> memory regions. Thus far I (and others involved) have not found a kernel
>> method to determine the maximum number of memory regions possible (if you
>> are aware of one, please let me know!). Thus CONFIG_CRASH_MAX_MEMORY_RANGES
>> was born (rather borrowed from kexec-tools).
> 
> Let's ask some mm folks.
> 
> mm folks, is there a way to enumerate all the memory regions a machine
> has?
> 
> It looks to me like register_memory_resource() in mm/memory_hotplug.c
> does register the resource so there should be a way to count that list
> of resources or at least maintain a count somewhere so that kexec/crash
> code can know how big its elfcodehdr buffer should be instead of doing a
> clumsy Kconfig item where people would need to guess...
> 
> Hmm.
> 

There is of course a way to enumerate the memory regions in use on the machine, that is not what 
this code needs. In order to compute the maximum buffer size needed (this buffer size is computed 
once), the count of the maximum number of memory regions possible (even if not currently in use) is 
what is needed.

>>> +#ifdef CONFIG_CRASH_MAX_MEMORY_RANGES
>> So I think the use of CONFIG_CRASH_MAX_MEMORY_RANGES is not correct; it
>> still needs to be based on the cpu or memory hotplug options.
> 
> You're kidding, right?
> 
> +config CRASH_MAX_MEMORY_RANGES
> +	depends on CRASH_DUMP && KEXEC_FILE && (HOTPLUG_CPU || MEMORY_HOTPLUG)
> 	^^^^^^^^^^				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Oh, that would be an error of haste on my part. This should be:
   depends on CRASH_DUMP && MEMORY_HOTPLUG

> 
>>> @@ -622,6 +622,15 @@ static int __init crash_save_vmcoreinfo_init(void)
>>>    subsys_initcall(crash_save_vmcoreinfo_init);
>>>    #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
>>> +
>>> +void __weak *arch_map_crash_pages(unsigned long paddr, unsigned long size)
>>> +{
>>> +	return NULL;
>>> +}
>>> +
>>> +void __weak arch_unmap_crash_pages(void **ptr) { }
>>> +void __weak arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action) { }
>>> +
>> I was asked by Baoquan He to eliminate the use of __weak
> 
> Because?
> 

Baoquan pointed me to:

https://lore.kernel.org/lkml/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com/T/

Thanks,
eric