[PATCH v12 7/7] x86/crash: Add x86 crash hotplug support

Fri Sep 30 08:36:49 PDT 2022

On 9/28/22 11:07, Borislav Petkov wrote:
> On Tue, Sep 13, 2022 at 02:12:31PM -0500, Eric DeVolder wrote:
>> This topic was discussed previously https://lkml.org/lkml/2022/3/3/372.
> 
> Please do not use lkml.org to refer to lkml messages. We have a
> perfectly fine archival system at lore.kernel.org. You simply do
> 
> https://lore.kernel.org/r/<Message-ID>
> 
> when you want to point to a previous mail.

ok, thanks for pointing that out to me.
> 
>> David points out that terminology is tricky here due to differing behaviors.
>> And perhaps that is your point in asking for guidance text. It can be
>> complicated
> 
> Which means you need an explanation how to use this even more.
> 
> And why is CONFIG_CRASH_MAX_MEMORY_RANGES even a Kconfig item and not
> something you discover from the hardware?

No, is the short answer.

> 
> Your help text talks about System RAM entries in /proc/iomem which means
> that those entries are present somewhere in the kernel and you can read
> them out and do the proper calculations dynamically instead of doing the
> static CONFIG_NR_CPUS_DEFAULT + CONFIG_CRASH_MAX_MEMORY_RANGES thing.

The intent is to compute the max size buffer needed to contain a maximum populated elfcorehdr, which 
is primarily based on the number of CPUs and memory regions. Thus far I (and others involved) have 
not found a kernel method to determine the maximum number of memory regions possible (if you are 
aware of one, please let me know!). Thus CONFIG_CRASH_MAX_MEMORY_RANGES was born (rather borrowed 
from kexec-tools).

So no dynamic computation is possible, yet.

> 
>> , but it all comes down to System RAM entries.
>>
>> I could perhaps offer an overly simplified example such that for 1GiB block
>> size, for example, the CRASH_MAX_MEMORY_RANGES of 32768 would allow for 32TiB
>> of memory?
> 
> Yes, and stick it somewhere in Documentation/admin-guide/kdump/ and
> refer to it in that help text so that people can find it and read how to
> use your new option.
> 
ok

>> The kbuf.bufsz value is obtained via a call to prepare_elf_headers(); I can
>> not initialize it at its declaration.
> 
> Sorry, I meant this:
> 
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 8fc7d678ac72..ee6fd9f1b2b9 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -395,8 +395,9 @@ int crash_load_segments(struct kimage *image)
>   	if (ret)
>   		return ret;
>   
> -	image->elf_headers = kbuf.buffer;
> -	image->elf_headers_sz = kbuf.bufsz;
> +	image->elf_headers	= kbuf.buffer;
> +	image->elf_headers_sz	= kbuf.bufsz;
> +	kbuf.memsz		= kbuf.bufsz;
>   
>   #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
>   	/* Ensure elfcorehdr segment large enough for hotplug changes */
> @@ -407,9 +408,8 @@ int crash_load_segments(struct kimage *image)
>   	image->elf_headers_sz = kbuf.memsz;
>   	image->elfcorehdr_index = image->nr_segments;
>   	image->elfcorehdr_index_valid = true;
> -#else
> -	kbuf.memsz = kbuf.bufsz;
>   #endif
> +
>   	kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
>   	kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
>   	ret = kexec_add_buffer(&kbuf);
> 
ok

>> I'm at a loss as to what to do differently here. You've raised this issue
>> before and I went back and looked at the suggestions then and I don't see
>> how that applies to this situation. How is this situation different than the
>> #ifdef CONFIG_KEXEC_FILE that immediately preceeds it?
> 
> See the diff at the end. I'm not saying this is how you should do it
> but it should give you a better idea. The logic being, the functions
> in the .c file don't really need ifdeffery around them - you're adding
> 1-2 functions and crash.c is not that big - so they can be built in
> unconditionally. You'd need the ifdeffery *in the header only* when
> crash.c is not being built.
ok; I've overlooked that scenario.
> 
> But I've done it with ifdeffery in the .c file now because yes, the
> kexec code is a minefield of ifdeffery. Hell, there's ifdeffery even in
> the headers for structs. Ifdeffery you don't really need. Someone should
> clean that up and simplify this immensely.

ok

> 
>> Currently there is a concurrent effort for PPC support by Sourabh
>> Jain, and in that effort arch_map_crash_pages() is using __va(paddr).
> 
> Why?
> 
>> I do not know the nuances between kmap_local_page() and __va() to
>> answer the question.
> 
> kmap_local_page() is a generic interface and it should work on any arch.
> 
> And it is documented even:
> 
> $ git grep kmap_local_page Documentation/
> 
>> If kmap_local_page() works for all archs, then I'm happy to drop these
>> arch_ variants and use it directly.
> 
> Yes, pls do.

I'll check with Sourabh to see if PPC can work with kmap_local_page().

> 
> ---
> 
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 432073385b2d..b73c9628cd85 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -205,6 +205,17 @@ void *arch_kexec_kernel_image_load(struct kimage *image);
>   
>   int arch_kimage_file_post_load_cleanup(struct kimage *image);
>   #define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
> +
> +#ifdef CONFIG_CRASH_MAX_MEMORY_RANGES
So I think the use of CONFIG_CRASH_MAX_MEMORY_RANGES is not correct; it still needs to be based on 
the cpu or memory hotplug options.

> +void *arch_map_crash_pages(unsigned long paddr, unsigned long size);
> +void arch_unmap_crash_pages(void **ptr);
> +void arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action);
> +#else
> +void *arch_map_crash_pages(unsigned long paddr, unsigned long size) { return NULL; }
> +void arch_unmap_crash_pages(void **ptr) { }
> +void arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action) { }
> +#endif
> +
>   #endif
>   #endif
>   
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 8fc7d678ac72..a526c893abe8 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -395,8 +395,9 @@ int crash_load_segments(struct kimage *image)
>   	if (ret)
>   		return ret;
>   
> -	image->elf_headers = kbuf.buffer;
> -	image->elf_headers_sz = kbuf.bufsz;
> +	image->elf_headers	= kbuf.buffer;
> +	image->elf_headers_sz	= kbuf.bufsz;
> +	kbuf.memsz		= kbuf.bufsz;
>   
>   #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
>   	/* Ensure elfcorehdr segment large enough for hotplug changes */
> @@ -407,9 +408,8 @@ int crash_load_segments(struct kimage *image)
>   	image->elf_headers_sz = kbuf.memsz;
>   	image->elfcorehdr_index = image->nr_segments;
>   	image->elfcorehdr_index_valid = true;
> -#else
> -	kbuf.memsz = kbuf.bufsz;
>   #endif
> +
>   	kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
>   	kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
>   	ret = kexec_add_buffer(&kbuf);
> @@ -425,7 +425,8 @@ int crash_load_segments(struct kimage *image)
>   }
>   #endif /* CONFIG_KEXEC_FILE */
>   
> -#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
> +#ifdef CONFIG_CRASH_MAX_MEMORY_RANGES
Again, I don't think CONFIG_CRASH_MAX_MEMORY_RANGES makes sense, at all.

> +
>   /*
>    * NOTE: The addresses and sizes passed to this routine have
>    * already been fully aligned on page boundaries. There is no
> @@ -462,8 +463,7 @@ void arch_unmap_crash_pages(void **ptr)
>    * is prepared in a kernel buffer, and then it is written on top
>    * of the existing/old elfcorehdr.
>    */
> -void arch_crash_handle_hotplug_event(struct kimage *image,
> -	unsigned int hp_action)
> +void arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action)
>   {
>   	struct kexec_segment *ksegment;
>   	unsigned char *ptr = NULL;
> @@ -513,4 +513,5 @@ void arch_crash_handle_hotplug_event(struct kimage *image,
>   	if (elfbuf)
>   		vfree(elfbuf);
>   }
> -#endif
> +
> +#endif /* CONFIG_CRASH_MAX_MEMORY_RANGES */
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index a48577a36fb8..0f79ad4c4f80 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -27,6 +27,19 @@ extern struct resource crashk_res;
>   extern struct resource crashk_low_res;
>   extern note_buf_t __percpu *crash_notes;
>   
> +/* Alignment required for elf header segment */
> +#define ELF_CORE_HEADER_ALIGN   4096
> +
> +struct crash_mem_range {
> +	u64 start, end;
> +};
> +
> +struct crash_mem {
> +	unsigned int max_nr_ranges;
> +	unsigned int nr_ranges;
> +	struct crash_mem_range ranges[];
> +};
> +
>   #ifdef CONFIG_KEXEC_CORE
>   #include <linux/list.h>
>   #include <linux/compat.h>
> @@ -237,19 +250,6 @@ static inline int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
>   }
>   #endif
>   
> -/* Alignment required for elf header segment */
> -#define ELF_CORE_HEADER_ALIGN   4096
> -
> -struct crash_mem_range {
> -	u64 start, end;
> -};
> -
> -struct crash_mem {
> -	unsigned int max_nr_ranges;
> -	unsigned int nr_ranges;
> -	struct crash_mem_range ranges[];
> -};
> -
>   extern int crash_exclude_mem_range(struct crash_mem *mem,
>   				   unsigned long long mstart,
>   				   unsigned long long mend);
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 5bc5159d9cb1..f6b5d835f826 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -622,6 +622,15 @@ static int __init crash_save_vmcoreinfo_init(void)
>   subsys_initcall(crash_save_vmcoreinfo_init);
>   
>   #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
> +
> +void __weak *arch_map_crash_pages(unsigned long paddr, unsigned long size)
> +{
> +	return NULL;
> +}
> +
> +void __weak arch_unmap_crash_pages(void **ptr) { }
> +void __weak arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action) { }
> +
I was asked by Baoquan He to eliminate the use of __weak, which I did. I followed the technique used 
by other kexec infrastructure.

>   /*
>    * To accurately reflect hot un/plug changes, the elfcorehdr (which
>    * is passed to the crash kernel via the elfcorehdr= parameter)
>