Question about Address Range Validation in Crash Kernel Allocation

Dave Young dyoung at redhat.com
Thu Mar 21 03:06:18 PDT 2024


Hi,

On Thu, 21 Mar 2024 at 17:49, Li Huafei <lihuafei1 at huawei.com> wrote:
>
> Hi Baoquan,
>
> On 2024/3/21 17:17, chenhaixiang (A) wrote:
> >
> >>> I'm sorry for the delay. Here are some details from the boot log and
> >> /proc/iomem:
> >>> The Boot log:
> >>> [    0.000000] Linux version 6.8.0 (root at localhost.localdomain) (gcc (GCC)
> >> 10.3.1, GNU ld (GNU Binutils) 2.37) #3 SMP PREEMPT_DYNAMIC Wed Mar 20
> >> 11:46:11 UTC 2024
> >>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.8.0
> >> root=/dev/mapper/root ro crashkernel=512M resume=/dev/mapper/swap
> >> rd.lvm.lv=root rd.lvm.lv=swap crash_kexec_post_notifiers softlockup_panic=1
> >> reserve_kbox_mem=16M fsck.mode=auto fsck.repair=yes panic=3
> >> nmi_watchdog=1 quiet rd.shell=0 memblock=debug efi=debug
> >> console=ttyS0,115200n8 console=tty0
> >> ......snip...
> >>> [    0.022622] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
> >> from=0x0000000000000000 max_addr=0x0000000100000000
> >> reserve_crashkernel_generic+0x7c/0x220
> >>> [    0.022628] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
> >> from=0x0000000100000000 max_addr=0x0000400000000000
> >> reserve_crashkernel_generic+0x7c/0x220
> >>> [    0.022632] memblock_reserve: [0x000000c01f000000-0x000000c03effffff]
> >> memblock_alloc_range_nid+0xee/0x170
> >>> [    0.022634] memblock_phys_alloc_range: 268435456 bytes align=0x1000000
> >> from=0x0000000000000000 max_addr=0x0000000100000000
> >> reserve_crashkernel_generic+0x11d/0x220
> >>> [    0.022638] memblock_reserve: [0x0000000049000000-0x0000000058ffffff]
> >> memblock_alloc_range_nid+0xee/0x170
> >>> [    0.022640] crashkernel low memory reserved: 0x49000000 - 0x59000000
> >> (256 MB)
> >>> [    0.022641] crashkernel reserved: 0x000000c01f000000 -
> >> 0x000000c03f000000 (512 MB)
> >>
> >> Here, crashkernel,low is reserved in region:  [0x49000000 - 0x59000000] (256
> >> MB)
> >>       crashkernel,high is reserved in region: [0x000000c01f000000 -
> >> 0x000000c03f000000] (512 MB) ......
> >>> [    0.029839] memblock_reserve: [0x000000c03ffff740-0x000000c03fffff7f]
> >> memblock_alloc_range_nid+0xee/0x170
> >>> [    0.029843] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
> >> reserved
> >>> [    0.029861] TSC deadline timer available
> >>
> >> Then here, region [0x53cbd000-0x53ccffff] is reserved in e820, and print abvoe
> >> "usable ==> reserved". This should be the step which prevents earlier reserved
> >> crashkernel,low from being added to iomem tree. I am not sure what triggered
> >> the e820 update.
>
> We added dump_stack () printing in efi_mem_reserve () and found that
> [0x53cbd000-0x53ccffff] was reserved by BGRT:
>
>   [    0.032259] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
> reserved
>   [    0.032262] CPU: 0 PID: 0 Comm: swapper Not tainted
> 5.10.0-60.18.0.50.h820.eulerosv2r11.x86_64 #7
>   [    0.032263] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 8.25
> 08/30/2022
>   [    0.032264] Call Trace:
>   [    0.032265]  ? dump_stack+0x57/0x6e
>   [    0.032267]  ? bgrt_init+0xc2/0xc2
>   [    0.032268]  ? __e820__range_update+0x7a/0x1d6
>   [    0.032270]  ? bgrt_init+0xc2/0xc2
>   [    0.032272]  ? bgrt_init+0xc2/0xc2
>   [    0.032274]  ? efi_arch_mem_reserve+0x1a3/0x1d0
>   [    0.032276]  ? efi_mem_reserve+0x2d/0x42
>   [    0.032278]  ? acpi_parse_bgrt+0xa/0x11
>   [    0.032279]  ? acpi_table_parse+0x86/0xbc
>   [    0.032281]  ? acpi_boot_init+0x79/0xad
>   [    0.032282]  ? setup_arch+0x835/0x954
>   [    0.032284]  ? start_kernel+0x5d/0x455
>   [    0.032286]  ? secondary_startup_64_no_verify+0xc2/0xcb
>
> efi_reserve_boot_services() has reserved memory of type
> EFI_BOOT_SERVICES_CODE & EFI_BOOT_SERVICES_DATA  before crashkernel.
> efi_bgrt_init() assumes that EFI_BOOT_SERVICES_DATA is not reserved by
> other modules. Then, the e820_table is directly updated, and the BGRT
> memory is reserved.
>
> However, memblock_is_region_reserved() in efi_reserve_boot_services()
> returns true when the ranges only overlap.
>
>      already_reserved = memblock_is_region_reserved(start, size);

Do you mean efi_reserve_boot_services is supposed to reserve the bgrt
memory but it does not reserve it due to the region overlapping with
some other reserved region?  If so can you debug and find what exact
memblock reserved region overlaps with the bgrt?

BTW, the previous email threads are weird, and not threading
correctly, hard to find information.

>
>      /*
>       * Because the following memblock_reserve() is paired
>       * with memblock_free_late() for this region in
>       * efi_free_boot_services(), we must be extremely
>       * careful not to reserve, and subsequently free,
>       * critical regions of memory (like the kernel image) or
>       * those regions that somebody else has already
>       * reserved.
>       *
>       * A good example of a critical region that must not be
>       * freed is page zero (first 4Kb of memory), which may
>       * contain boot services code/data but is marked
>       * E820_TYPE_RESERVED by trim_bios_range().
>       */
>      if (!already_reserved) {
>              memblock_reserve(start, size);
>
>              /*
>               * If we are the first to reserve the region, no
>               * one else cares about it. We own it and can
>               * free it later.
>               */
>              if (can_free_region(start, size))
>                      continue;
>      }
>
> As a result, some memory of EFI_BOOT_SERVICES_DATA is not reserved in
> advance. The subsequent crashkernel happens to reserve this portion of
> memory, which conflicts with BGRT.
>
> > Current analysis suggests that efi_reserve_boot_services() is causing the update of the e820 table.
> >
> >>
> >> How do you boot into your new 6.8.0 kernel? Used kexec -l to jump into the 2nd
> >> kernel, or reboot from bios/firmware boot up into 6.8.0?
> > It's reboot from bios boot up into 6.8.0. I attempted to revert the below patch,
> >  and this time the conflicting segment "53cbd000-53ccffff" also appeared in the /proc/iomem
> >  of the 6.8 kernel.
> >
> > 2d4fd058-60efefff : System RAM
> >   2d4fd058-58ffffff : System RAM
> >     49000000-58ffffff : Crash kernel
> >       53cbd000-53ccffff : Reserved
> > 60eff000-704fefff : Reserved
> > --
> >   93dd424000-93dd9fffff : Kernel bss
> >   c01f000000-c03effffff : Crash kernel
> > d0000000000-d0fffffffff : PCI Bus 0000:00
> >   d0000000000-d00001fffff : PCI Bus 0000:01
> >>
> >> Reverting below commit should fix your problem, can you try it?
> >>
> >> commit 4a693ce65b186fddc1a73621bd6f941e6e3eca21
> >> Author: Huacai Chen <chenhuacai at kernel.org>
> >> Date:   Fri Dec 29 16:02:13 2023 +0800
> >>
> >>     kdump: defer the insertion of crashkernel resources
> >
> > .
> >
>
> _______________________________________________
> kexec mailing list
> kexec at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec




More information about the kexec mailing list