[PATCH -next] crash: Fix riscv64 crash memory reserve dead loop

Baoquan He bhe at redhat.com
Thu Aug 8 18:56:25 PDT 2024


On 08/08/24 at 03:56pm, Jinjie Ruan wrote:
> 
> 
> On 2024/8/7 3:34, Catalin Marinas wrote:
> > On Tue, Aug 06, 2024 at 08:10:30PM +0100, Catalin Marinas wrote:
> >> On Fri, Aug 02, 2024 at 06:11:01PM +0800, Baoquan He wrote:
> >>> On 08/02/24 at 05:01pm, Jinjie Ruan wrote:
> >>>> On RISCV64 Qemu machine with 512MB memory, cmdline "crashkernel=500M,high"
> >>>> will cause system stall as below:
> >>>>
> >>>> 	 Zone ranges:
> >>>> 	   DMA32    [mem 0x0000000080000000-0x000000009fffffff]
> >>>> 	   Normal   empty
> >>>> 	 Movable zone start for each node
> >>>> 	 Early memory node ranges
> >>>> 	   node   0: [mem 0x0000000080000000-0x000000008005ffff]
> >>>> 	   node   0: [mem 0x0000000080060000-0x000000009fffffff]
> >>>> 	 Initmem setup node 0 [mem 0x0000000080000000-0x000000009fffffff]
> >>>> 	(stall here)
> >>>>
> >>>> commit 5d99cadf1568 ("crash: fix x86_32 crash memory reserve dead loop
> >>>> bug") fix this on 32-bit architecture. However, the problem is not
> >>>> completely solved. If `CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX` on 64-bit
> >>>> architecture, for example, when system memory is equal to
> >>>> CRASH_ADDR_LOW_MAX on RISCV64, the following infinite loop will also occur:
> >>>
> >>> Interesting, I didn't expect risc-v defining them like these.
> >>>
> >>> #define CRASH_ADDR_LOW_MAX              dma32_phys_limit
> >>> #define CRASH_ADDR_HIGH_MAX             memblock_end_of_DRAM()
> >>
> >> arm64 defines the high limit as PHYS_MASK+1, it doesn't need to be
> >> dynamic and x86 does something similar (SZ_64T). Not sure why the
> >> generic code and riscv define it like this.
> >>
> >>>> 	-> reserve_crashkernel_generic() and high is true
> >>>> 	   -> alloc at [CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX] fail
> >>>> 	      -> alloc at [0, CRASH_ADDR_LOW_MAX] fail and repeatedly
> >>>> 	         (because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX).
> >>>>
> >>>> Before refactor in commit 9c08a2a139fe ("x86: kdump: use generic interface
> >>>> to simplify crashkernel reservation code"), x86 do not try to reserve crash
> >>>> memory at low if it fails to alloc above high 4G. However before refator in
> >>>> commit fdc268232dbba ("arm64: kdump: use generic interface to simplify
> >>>> crashkernel reservation"), arm64 try to reserve crash memory at low if it
> >>>> fails above high 4G. For 64-bit systems, this attempt is less beneficial
> >>>> than the opposite, remove it to fix this bug and align with native x86
> >>>> implementation.
> >>>
> >>> And I don't like the idea crashkernel=,high failure will fallback to
> >>> attempt in low area, so this looks good to me.
> >>
> >> Well, I kind of liked this behaviour. One can specify ,high as a
> >> preference rather than forcing a range. The arm64 land has different
> >> platforms with some constrained memory layouts. Such fallback works well
> >> as a default command line option shipped with distros without having to
> >> guess the SoC memory layout.
> > 
> > I haven't tried but it's possible that this patch also breaks those
> > arm64 platforms with all RAM above 4GB when CRASH_ADDR_LOW_MAX is
> > memblock_end_of_DRAM(). Here all memory would be low and in the absence
> > of no fallback, it fails to allocate.
> > 
> > So, my strong preference would be to re-instate the current behaviour
> > and work around the infinite loop in a different way.
> 
> Hi, baoquan, What's your opinion?
> 
> Only this patch should be re-instate or all the 3 dead loop fix patch?

I am not sure which way Catalin suggested to take. 

Hi Catalin,

Could you say more words about your preference so that Jinjie can
proceed accordingly?

Thanks
Baoquan




More information about the kexec mailing list