[PATCH v12 7/7] x86/crash: Add x86 crash hotplug support
Eric DeVolder
eric.devolder at oracle.com
Fri Sep 30 08:36:49 PDT 2022
On 9/28/22 11:07, Borislav Petkov wrote:
> On Tue, Sep 13, 2022 at 02:12:31PM -0500, Eric DeVolder wrote:
>> This topic was discussed previously https://lkml.org/lkml/2022/3/3/372.
>
> Please do not use lkml.org to refer to lkml messages. We have a
> perfectly fine archival system at lore.kernel.org. You simply do
>
> https://lore.kernel.org/r/<Message-ID>
>
> when you want to point to a previous mail.
ok, thanks for pointing that out to me.
>
>> David points out that terminology is tricky here due to differing behaviors.
>> And perhaps that is your point in asking for guidance text. It can be
>> complicated
>
> Which means you need an explanation how to use this even more.
>
> And why is CONFIG_CRASH_MAX_MEMORY_RANGES even a Kconfig item and not
> something you discover from the hardware?
No, is the short answer.
>
> Your help text talks about System RAM entries in /proc/iomem which means
> that those entries are present somewhere in the kernel and you can read
> them out and do the proper calculations dynamically instead of doing the
> static CONFIG_NR_CPUS_DEFAULT + CONFIG_CRASH_MAX_MEMORY_RANGES thing.
The intent is to compute the max size buffer needed to contain a maximum populated elfcorehdr, which
is primarily based on the number of CPUs and memory regions. Thus far I (and others involved) have
not found a kernel method to determine the maximum number of memory regions possible (if you are
aware of one, please let me know!). Thus CONFIG_CRASH_MAX_MEMORY_RANGES was born (rather borrowed
from kexec-tools).
So no dynamic computation is possible, yet.
>
>> , but it all comes down to System RAM entries.
>>
>> I could perhaps offer an overly simplified example such that for 1GiB block
>> size, for example, the CRASH_MAX_MEMORY_RANGES of 32768 would allow for 32TiB
>> of memory?
>
> Yes, and stick it somewhere in Documentation/admin-guide/kdump/ and
> refer to it in that help text so that people can find it and read how to
> use your new option.
>
ok
>> The kbuf.bufsz value is obtained via a call to prepare_elf_headers(); I can
>> not initialize it at its declaration.
>
> Sorry, I meant this:
>
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 8fc7d678ac72..ee6fd9f1b2b9 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -395,8 +395,9 @@ int crash_load_segments(struct kimage *image)
> if (ret)
> return ret;
>
> - image->elf_headers = kbuf.buffer;
> - image->elf_headers_sz = kbuf.bufsz;
> + image->elf_headers = kbuf.buffer;
> + image->elf_headers_sz = kbuf.bufsz;
> + kbuf.memsz = kbuf.bufsz;
>
> #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
> /* Ensure elfcorehdr segment large enough for hotplug changes */
> @@ -407,9 +408,8 @@ int crash_load_segments(struct kimage *image)
> image->elf_headers_sz = kbuf.memsz;
> image->elfcorehdr_index = image->nr_segments;
> image->elfcorehdr_index_valid = true;
> -#else
> - kbuf.memsz = kbuf.bufsz;
> #endif
> +
> kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
> kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
> ret = kexec_add_buffer(&kbuf);
>
ok
>> I'm at a loss as to what to do differently here. You've raised this issue
>> before and I went back and looked at the suggestions then and I don't see
>> how that applies to this situation. How is this situation different than the
>> #ifdef CONFIG_KEXEC_FILE that immediately preceeds it?
>
> See the diff at the end. I'm not saying this is how you should do it
> but it should give you a better idea. The logic being, the functions
> in the .c file don't really need ifdeffery around them - you're adding
> 1-2 functions and crash.c is not that big - so they can be built in
> unconditionally. You'd need the ifdeffery *in the header only* when
> crash.c is not being built.
ok; I've overlooked that scenario.
>
> But I've done it with ifdeffery in the .c file now because yes, the
> kexec code is a minefield of ifdeffery. Hell, there's ifdeffery even in
> the headers for structs. Ifdeffery you don't really need. Someone should
> clean that up and simplify this immensely.
ok
>
>> Currently there is a concurrent effort for PPC support by Sourabh
>> Jain, and in that effort arch_map_crash_pages() is using __va(paddr).
>
> Why?
>
>> I do not know the nuances between kmap_local_page() and __va() to
>> answer the question.
>
> kmap_local_page() is a generic interface and it should work on any arch.
>
> And it is documented even:
>
> $ git grep kmap_local_page Documentation/
>
>> If kmap_local_page() works for all archs, then I'm happy to drop these
>> arch_ variants and use it directly.
>
> Yes, pls do.
I'll check with Sourabh to see if PPC can work with kmap_local_page().
>
> ---
>
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 432073385b2d..b73c9628cd85 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -205,6 +205,17 @@ void *arch_kexec_kernel_image_load(struct kimage *image);
>
> int arch_kimage_file_post_load_cleanup(struct kimage *image);
> #define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
> +
> +#ifdef CONFIG_CRASH_MAX_MEMORY_RANGES
So I think the use of CONFIG_CRASH_MAX_MEMORY_RANGES is not correct; it still needs to be based on
the cpu or memory hotplug options.
> +void *arch_map_crash_pages(unsigned long paddr, unsigned long size);
> +void arch_unmap_crash_pages(void **ptr);
> +void arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action);
> +#else
> +void *arch_map_crash_pages(unsigned long paddr, unsigned long size) { return NULL; }
> +void arch_unmap_crash_pages(void **ptr) { }
> +void arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action) { }
> +#endif
> +
> #endif
> #endif
>
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 8fc7d678ac72..a526c893abe8 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -395,8 +395,9 @@ int crash_load_segments(struct kimage *image)
> if (ret)
> return ret;
>
> - image->elf_headers = kbuf.buffer;
> - image->elf_headers_sz = kbuf.bufsz;
> + image->elf_headers = kbuf.buffer;
> + image->elf_headers_sz = kbuf.bufsz;
> + kbuf.memsz = kbuf.bufsz;
>
> #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
> /* Ensure elfcorehdr segment large enough for hotplug changes */
> @@ -407,9 +408,8 @@ int crash_load_segments(struct kimage *image)
> image->elf_headers_sz = kbuf.memsz;
> image->elfcorehdr_index = image->nr_segments;
> image->elfcorehdr_index_valid = true;
> -#else
> - kbuf.memsz = kbuf.bufsz;
> #endif
> +
> kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
> kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
> ret = kexec_add_buffer(&kbuf);
> @@ -425,7 +425,8 @@ int crash_load_segments(struct kimage *image)
> }
> #endif /* CONFIG_KEXEC_FILE */
>
> -#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
> +#ifdef CONFIG_CRASH_MAX_MEMORY_RANGES
Again, I don't think CONFIG_CRASH_MAX_MEMORY_RANGES makes sense, at all.
> +
> /*
> * NOTE: The addresses and sizes passed to this routine have
> * already been fully aligned on page boundaries. There is no
> @@ -462,8 +463,7 @@ void arch_unmap_crash_pages(void **ptr)
> * is prepared in a kernel buffer, and then it is written on top
> * of the existing/old elfcorehdr.
> */
> -void arch_crash_handle_hotplug_event(struct kimage *image,
> - unsigned int hp_action)
> +void arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action)
> {
> struct kexec_segment *ksegment;
> unsigned char *ptr = NULL;
> @@ -513,4 +513,5 @@ void arch_crash_handle_hotplug_event(struct kimage *image,
> if (elfbuf)
> vfree(elfbuf);
> }
> -#endif
> +
> +#endif /* CONFIG_CRASH_MAX_MEMORY_RANGES */
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index a48577a36fb8..0f79ad4c4f80 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -27,6 +27,19 @@ extern struct resource crashk_res;
> extern struct resource crashk_low_res;
> extern note_buf_t __percpu *crash_notes;
>
> +/* Alignment required for elf header segment */
> +#define ELF_CORE_HEADER_ALIGN 4096
> +
> +struct crash_mem_range {
> + u64 start, end;
> +};
> +
> +struct crash_mem {
> + unsigned int max_nr_ranges;
> + unsigned int nr_ranges;
> + struct crash_mem_range ranges[];
> +};
> +
> #ifdef CONFIG_KEXEC_CORE
> #include <linux/list.h>
> #include <linux/compat.h>
> @@ -237,19 +250,6 @@ static inline int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
> }
> #endif
>
> -/* Alignment required for elf header segment */
> -#define ELF_CORE_HEADER_ALIGN 4096
> -
> -struct crash_mem_range {
> - u64 start, end;
> -};
> -
> -struct crash_mem {
> - unsigned int max_nr_ranges;
> - unsigned int nr_ranges;
> - struct crash_mem_range ranges[];
> -};
> -
> extern int crash_exclude_mem_range(struct crash_mem *mem,
> unsigned long long mstart,
> unsigned long long mend);
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 5bc5159d9cb1..f6b5d835f826 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -622,6 +622,15 @@ static int __init crash_save_vmcoreinfo_init(void)
> subsys_initcall(crash_save_vmcoreinfo_init);
>
> #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
> +
> +void __weak *arch_map_crash_pages(unsigned long paddr, unsigned long size)
> +{
> + return NULL;
> +}
> +
> +void __weak arch_unmap_crash_pages(void **ptr) { }
> +void __weak arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action) { }
> +
I was asked by Baoquan He to eliminate the use of __weak, which I did. I followed the technique used
by other kexec infrastructure.
> /*
> * To accurately reflect hot un/plug changes, the elfcorehdr (which
> * is passed to the crash kernel via the elfcorehdr= parameter)
>
More information about the kexec
mailing list