[PATCH -next] crash: Fix riscv64 crash memory reserve dead loop

Sun Aug 4 19:01:37 PDT 2024

On 2024/8/2 20:24, Alexandre Ghiti wrote:
> Hi Jinjie,
> 
> On 02/08/2024 11:01, Jinjie Ruan wrote:
>> On RISCV64 Qemu machine with 512MB memory, cmdline
>> "crashkernel=500M,high"
>> will cause system stall as below:
>>
>>      Zone ranges:
>>        DMA32    [mem 0x0000000080000000-0x000000009fffffff]
>>        Normal   empty
>>      Movable zone start for each node
>>      Early memory node ranges
>>        node   0: [mem 0x0000000080000000-0x000000008005ffff]
>>        node   0: [mem 0x0000000080060000-0x000000009fffffff]
>>      Initmem setup node 0 [mem 0x0000000080000000-0x000000009fffffff]
>>     (stall here)
>>
>> commit 5d99cadf1568 ("crash: fix x86_32 crash memory reserve dead loop
> 
> 
> I can't find this revision, was this patch merged in 6.11

Yes, it is in linux-next.

> 
> 
>> bug") fix this on 32-bit architecture. However, the problem is not
>> completely solved. If `CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX` on
>> 64-bit
>> architecture, for example, when system memory is equal to
>> CRASH_ADDR_LOW_MAX on RISCV64, the following infinite loop will also
>> occur:
>>
>>     -> reserve_crashkernel_generic() and high is true
>>        -> alloc at [CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX] fail
>>           -> alloc at [0, CRASH_ADDR_LOW_MAX] fail and repeatedly
>>              (because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX).
>>
>> Before refactor in commit 9c08a2a139fe ("x86: kdump: use generic
>> interface
>> to simplify crashkernel reservation code"), x86 do not try to reserve
>> crash
>> memory at low if it fails to alloc above high 4G. However before
>> refator in
>> commit fdc268232dbba ("arm64: kdump: use generic interface to simplify
>> crashkernel reservation"), arm64 try to reserve crash memory at low if it
>> fails above high 4G. For 64-bit systems, this attempt is less beneficial
>> than the opposite, remove it to fix this bug and align with native x86
>> implementation.
>>
>> After this patch, it print:
>>     cannot allocate crashkernel (size:0x1f400000)
>>
>> Fixes: 39365395046f ("riscv: kdump: use generic interface to simplify
>> crashkernel reservation")
> 
> 
> Your patch subject indicates "-next" but I see this commit ^ landed in
> 6.7, so I think we should merge it now, let me know if I missed something.
> 
> Thanks,
> 
> Alex
> 
> 
>> Signed-off-by: Jinjie Ruan <ruanjinjie at huawei.com>
>> ---
>>   kernel/crash_reserve.c | 9 ---------
>>   1 file changed, 9 deletions(-)
>>
>> diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c
>> index 5387269114f6..69e4b8b7b969 100644
>> --- a/kernel/crash_reserve.c
>> +++ b/kernel/crash_reserve.c
>> @@ -420,15 +420,6 @@ void __init reserve_crashkernel_generic(char
>> *cmdline,
>>                   goto retry;
>>           }
>>   -        /*
>> -         * For crashkernel=size[KMG],high, if the first attempt was
>> -         * for high memory, fall back to low memory.
>> -         */
>> -        if (high && search_end == CRASH_ADDR_HIGH_MAX) {
>> -            search_end = CRASH_ADDR_LOW_MAX;
>> -            search_base = 0;
>> -            goto retry;
>> -        }
>>           pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
>>               crash_size);
>>           return;
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv