[PATCH v10 0/5] support reserving crashkernel above 4G on arm64 kdump

John Donnelly john.p.donnelly at oracle.com
Mon Jul 27 08:38:54 EDT 2020


On 7/3/20 3:38 AM, chenzhou wrote:
> Hi Bhupesh,
>
>
> On 2020/7/3 15:26, Bhupesh Sharma wrote:
>> Hi Chen,
>>
>> On Fri, Jul 3, 2020 at 9:24 AM Chen Zhou <chenzhou10 at huawei.com> wrote:
>>> This patch series enable reserving crashkernel above 4G in arm64.
>>>
>>> There are following issues in arm64 kdump:
>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>>> when there is no enough low memory.
>>> 2. Currently, crashkernel=Y at X can be used to reserve crashkernel above 4G,
>>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>>> will boot failure because there is no low memory available for allocation.
>>> 3. commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32") broken
>>> the arm64 kdump. If the memory reserved for crash dump kernel falled in
>>> ZONE_DMA32, the devices in crash dump kernel need to use ZONE_DMA will alloc
>>> fail.
>>>
>>> To solve these issues, introduce crashkernel=X,low to reserve specified
>>> size low memory.
>>> Crashkernel=X tries to reserve memory for the crash dump kernel under
>>> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
>>> size low memory for crash kdump kernel devices firstly and then reserve
>>> memory above 4G.
>>>
>>> When crashkernel is reserved above 4G in memory and crashkernel=X,low
>>> is specified simultaneously, kernel should reserve specified size low memory
>>> for crash dump kernel devices. So there may be two crash kernel regions, one
>>> is below 4G, the other is above 4G.
>>> In order to distinct from the high region and make no effect to the use of
>>> kexec-tools, rename the low region as "Crash kernel (low)", and pass the
>>> low region by reusing DT property "linux,usable-memory-range". We made the low
>>> memory region as the last range of "linux,usable-memory-range" to keep
>>> compatibility with existing user-space and older kdump kernels.
>>>
>>> Besides, we need to modify kexec-tools:
>>> arm64: support more than one crash kernel regions(see [1])
>>>
>>> Another update is document about DT property 'linux,usable-memory-range':
>>> schemas: update 'linux,usable-memory-range' node schema(see [2])
>>>
>>> The previous changes and discussions can be retrieved from:
>>>
>>> Changes since [v9]
>>> - Patch 1 add Acked-by from Dave.
>>> - Update patch 5 according to Dave's comments.
>>> - Update chosen schema.
>>>
>>> Changes since [v8]
>>> - Reuse DT property "linux,usable-memory-range".
>>> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
>>> memory region.
>>> - Fix kdump broken with ZONE_DMA reintroduced.
>>> - Update chosen schema.
>>>
>>> Changes since [v7]
>>> - Move x86 CRASH_ALIGN to 2M
>>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
>>> - Update Documentation/devicetree/bindings/chosen.txt.
>>> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
>>> suggested by Arnd.
>>> - Add Tested-by from Jhon and pk.
>>>
>>> Changes since [v6]
>>> - Fix build errors reported by kbuild test robot.
>>>
>>> Changes since [v5]
>>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
>>> - Delete crashkernel=X,high.
>>> - Modify crashkernel=X,low.
>>> If crashkernel=X,low is specified simultaneously, reserve spcified size low
>>> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
>>> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
>>> pass to crash dump kernel by DT property "linux,low-memory-range".
>>> - Update Documentation/admin-guide/kdump/kdump.rst.
>>>
>>> Changes since [v4]
>>> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>>>
>>> Changes since [v3]
>>> - Add memblock_cap_memory_ranges back for multiple ranges.
>>> - Fix some compiling warnings.
>>>
>>> Changes since [v2]
>>> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
>>> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
>>> patch.
>>>
>>> Changes since [v1]:
>>> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
>>> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
>>> in fdt_enforce_memory_region().
>>> There are at most two crash kernel regions, for two crash kernel regions
>>> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
>>> and then remove the memory range in the middle.
>>>
>>> [1]: https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-June/020737.html__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Su4V80IBu$
>>> [2]: https://urldefense.com/v3/__https://github.com/robherring/dt-schema/pull/19__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Su3Exu3Pr$
>>> [v1]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Su_RTeG6n$
>>> [v2]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Su3HI0hvE$
>>> [v3]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Su-DmOkg5$
>>> [v4]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-SuykJijY2$
>>> [v5]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Su2YHe5UX$
>>> [v6]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Su9HL5p7k$
>>> [v7]: https://urldefense.com/v3/__https://lkml.org/lkml/2019/12/23/411__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Su_mHOJs0$
>>> [v8]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/5/21/213__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Su7UYMTZJ$
>>> [v9]: https://urldefense.com/v3/__https://lkml.org/lkml/2020/6/28/73__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Suxcd0E6t$
>>>
>>> Chen Zhou (5):
>>>    x86: kdump: move reserve_crashkernel_low() into crash_core.c
>>>    arm64: kdump: reserve crashkenel above 4G for crash dump kernel
>>>    arm64: kdump: add memory for devices by DT property
>>>      linux,usable-memory-range
>>>    arm64: kdump: fix kdump broken with ZONE_DMA reintroduced
>>>    kdump: update Documentation about crashkernel on arm64
>>>
>>>   Documentation/admin-guide/kdump/kdump.rst     | 14 ++-
>>>   .../admin-guide/kernel-parameters.txt         | 17 +++-
>>>   arch/arm64/kernel/setup.c                     |  8 +-
>>>   arch/arm64/mm/init.c                          | 74 ++++++++++++---
>>>   arch/x86/kernel/setup.c                       | 66 ++------------
>>>   include/linux/crash_core.h                    |  3 +
>>>   include/linux/kexec.h                         |  2 -
>>>   kernel/crash_core.c                           | 90 +++++++++++++++++++
>>>   kernel/kexec_core.c                           | 17 ----
>>>   9 files changed, 197 insertions(+), 94 deletions(-)
>>>
>>> --
>>> 2.20.1
>> Thanks for the v10.
>>
>> 1. Seems this series is still broken on arm64 boards like ampere and
>> ThunderX2 (marvell) because of the ZONE_DMA32 related OOM seen while
>> booting kdump kernel.
>> Here are details about my environment:
>>
>> - Latest upstream Linus master branch (5.8.0-rc3) + your v10 patches.
>> - Latest upstream kexec-tools + your v4 patch.
>>
>> # dmesg | grep -i crash
>> [    0.000000] crashkernel reserved: 0x00000000ca000000 -
>> 0x00000000ea000000 (512 MB)
>> [    0.000000] Kernel command line:
>> BOOT_IMAGE=(hd13,gpt2)/vmlinuz-5.8.0-rc3+
>> root=/dev/mapper/rhel_hpe--apache--cn99xx--09-root ro
>> rd.lvm.lv=rhel_hpe-apache-cn99xx-09/root
>> rd.lvm.lv=rhel_hpe-apache-cn99xx-09/swap crashkernel=512M
>> [   58.917523]     crashkernel=512M
>>
>> 2. Here is the OOM crash seen while booting the kdump kernel:
>>
>> [    0.244724] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
>> [    0.251859] Unable to handle kernel NULL pointer dereference at
>> virtual address 0000000000000188
>> [    0.260737] Mem abort info:
>> [    0.263553]   ESR = 0x96000006
>> [    0.266632]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [    0.271994]   SET = 0, FnV = 0
>> [    0.275074]   EA = 0, S1PTW = 0
>> [    0.278239] Data abort info:
>> [    0.281141]   ISV = 0, ISS = 0x00000006
>> [    0.285010]   CM = 0, WnR = 0
>> [    0.288001] [0000000000000188] user address but active_mm is swapper
>> [    0.294420] Internal error: Oops: 96000006 [#1] SMP
>> [    0.299344] Modules linked in:
>> [    0.302424] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.0-rc3+ #8
>> [    0.308753] Hardware name: HPE Apollo 70             /C01_APACHE_MB
>>          , BIOS L50_5.13_1.11 06/18/2019
>> [    0.318599] pstate: 00400009 (nzcv daif +PAN -UAO BTYPE=--)
>> [    0.324228] pc : mem_cgroup_get_nr_swap_pages+0x2c/0x60
>> [    0.329506] lr : shrink_lruvec+0x404/0x4f8
>> [    0.333638] sp : fffffe0012b8f840
>> [    0.336979] x29: fffffe0012b8f840 x28: fffffe00116b3000
>> [    0.342343] x27: fffffe0012b8fb00 x26: 0000000000000020
>> [    0.347707] x25: 0000000000000000 x24: fffffc0069fffe28
>> [    0.353070] x23: 0000000000000000 x22: 0000000000000000
>> [    0.358433] x21: 000000000000003c x20: fffffe0012b8fa98
>> [    0.363796] x19: 0000000000000000 x18: 0000000000000010
>> [    0.369159] x17: 00000000bd8afee8 x16: 000000001260aa76
>> [    0.374523] x15: ffffffffffffffff x14: fffffe00116b3988
>> [    0.379886] x13: fffffe0092b8faa7 x12: fffffe0012b8faaf
>> [    0.385248] x11: fffffe00116f1000 x10: fffffe0012b8fa30
>> [    0.390612] x9 : fffffe0010244ebc x8 : 0000000000000000
>> [    0.395975] x7 : 0000000000000020 x6 : 00000000ffff8ae3
>> [    0.401338] x5 : 0000000000000000 x4 : fffffc004da89000
>> [    0.406701] x3 : 0000000000000000 x2 : 0000000000000000
>> [    0.412064] x1 : fffffe00116bf000 x0 : 0000000000000000
>> [    0.417427] Call trace:
>> [    0.419891]  mem_cgroup_get_nr_swap_pages+0x2c/0x60
>> [    0.424815]  shrink_node+0x1a8/0x688
>> [    0.428420]  do_try_to_free_pages+0xe8/0x448
>> [    0.432729]  try_to_free_pages+0x110/0x230
>> [    0.436863]  __alloc_pages_slowpath.constprop.106+0x2b8/0xb48
>> [    0.442666]  __alloc_pages_nodemask+0x2ac/0x2f8
>> [    0.447239]  alloc_page_interleave+0x20/0x90
>> [    0.451548]  alloc_pages_current+0xdc/0xf8
>> [    0.455681]  atomic_pool_expand+0x60/0x210
>> [    0.459817]  __dma_atomic_pool_init+0x50/0xa4
>> [    0.464214]  dma_atomic_pool_init+0xac/0x158
>> [    0.468522]  do_one_initcall+0x50/0x218
>> [    0.472393]  kernel_init_freeable+0x22c/0x2d0
>> [    0.476792]  kernel_init+0x18/0x110
>> [    0.480310]  ret_from_fork+0x10/0x18
>> [    0.483918] Code: 350001e3 d503201f f9450024 1400000a (f940c401)
>> [    0.490074] ---[ end trace e5a9147af159e580 ]---
>> [    0.494734] Kernel panic - not syncing: Fatal exception
>> [    0.500010] Rebooting in 10 seconds..
>>
>> 3. Did you test your patch with a simple crashkernel=512M command line
>> (without using the crashkernel hi/lo or crashkernel=X at Y format)?
>>
>> Anyway, since this implementation still needs rework, we can go ahead
>> with the arrangement of limiting the crashkernel allocation in
>> ZONE_DMA range (as I suggested in another patch series
>> <https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-July/020777.html__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Su56QERe_$ >) in
>> the meanwhile. to ensure the upstream kernel can still support kdump
>> on arm64 boards where it was working before the ZONE_DMA32 changes
>> were introduced for arm64.
>>
>> Please let me know your views,
> Thanks for your test and sharing your views. I have no questions about the 1 and 2 you mentioned.
>
> I charity the issue in my patch 4 and suggest to use the parameter like
> "crashkernel=X crashkernel=Y,low" if CONFIG_ZONE_DMA is enabled.
> I also document this in doc in patch 5.
>
> I choose to address the issue based on the  "reserving crashkernel above 4G",
> because we just need to adjust the low memory limit instead of limiting the
> whole crahshkernel to ZONE_DMA.
> details: https://urldefense.com/v3/__https://lkml.org/lkml/2020/7/3/64__;!!GqivPVa7Brio!LQeROomdhNOjTVFcQP6pLxDm9nhbEsY3vqZMI7NHeDU_VnCaN7iw2DJ84x-Su1vtGdek$
>
> But you are right, arm64 kdump is broken for long time, including the issue you addressed
> "Append new variables to vmcoreinfo (TCR_EL1.T1SZ for arm64 and MAX_PHYSMEM_BITS for all archs)".
>
> I agree with you to make it work as soon as possible.
>
> Ping James, Will,
> any other comments about this patch series?
>
> Thanks,
> Chen Zhou
>

Hi  James and Will,


   This patch set has been in review for over a year, since May of 2019. 
   What is holding up getting this accepted ?






More information about the kexec mailing list