[PATCH v2] arm64: hibernate: Fix level3 translation fault in swsusp_save()
Yaxiong Tian
13327272236 at 163.com
Tue Apr 16 19:13:16 PDT 2024
在 2024/4/13 01:30, Catalin Marinas 写道:
> For some reason I missed the updated patch.
>
> On Fri, Mar 01, 2024 at 10:19:24AM +0800, Yaxiong Tian wrote:
>> From: Yaxiong Tian <tianyaxiong at kylinos.cn>
>>
>> On ARM64 machines using UEFI, if can_set_direct_map() return false by
>> setting some CONFIGS in kernel build or grub,such as
>> NO CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT、NO CONFIG_KFENCE
>> NO CONFIG_RODATA_FULL_DEFAULT_ENABLED.Also with setting rodata=off、
>> debug_pagealloc=off in grub and NO CONFIG_KFENCE.
>> swsusp_save() will fail due to can't finding the map table under the
>> nomap memory.such as:
> [...]
>> [ 48.532162] Call trace:
>> [ 48.532162] swsusp_save+0x280/0x538
>> [ 48.532162] swsusp_arch_suspend+0x148/0x190
>> [ 48.532162] hibernation_snapshot+0x240/0x39c
>> [ 48.532162] hibernate+0xc4/0x378
>> [ 48.532162] state_store+0xf0/0x10c
>> [ 48.532162] kobj_attr_store+0x14/0x24
>>
>> This issue can be reproduced in QEMU using UEFI when booting with
>> rodata=off、debug_pagealloc=off in grub and NO CONFIG_KFENCE.
>>
>> This is because in swsusp_save()->copy_data_pages()->page_is_saveable(),
>> kernel_page_present() presumes that a page is present when can_set_direct_map()
>> returns false even for NOMAP ranges.So NOMAP pages will saved in after,and then
>> cause level3 translation fault in this pages.
> I can see how kernel_page_present() ended up returning true if
> !can_set_direct_map(), though based on the function naming only, it
> feels a bit unintuitive. Is arm64 the only architecture making use of
> MEMBLOCK_NOMAP? Or is it the only one where kernel_page_present() also
> returns true if !can_set_direct_map()?
It looks like ARM64 is only one where kernel_page_present() also returns
true if !can_set_direct_map(). I remember that on x86 models there are
no NOMAP regions, on ARM64 machines these NOMAP regions are set with uefi.
Other details are too old to remember.
>> diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
>> index 02870beb271e..d90005de1d26 100644
>> --- a/arch/arm64/kernel/hibernate.c
>> +++ b/arch/arm64/kernel/hibernate.c
>> @@ -94,7 +94,7 @@ int pfn_is_nosave(unsigned long pfn)
>> unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
>>
>> return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn)) ||
>> - crash_is_nosave(pfn);
>> + crash_is_nosave(pfn) || !pfn_is_map_memory(pfn);
>> }
> This indeed fixes the problem but it looks like an arm64-specific
> workaround. I can see at least arm, loongarch and riscv making use of
> memblock_is_map_memory() (which is what pfn_is_map_memory() calls). Do
> they not have the same problem? On riscv, for example,
> kernel_page_present() does not depend on any ARCH_HAS_SET_DIRECT_MAP
> related options/conditions (neither does x86 though not sure it cares
> about MEMBLOCK_NOMAP). Should we do the same for arm64 and drop the
> !can_set_direct_map() condition in kernel_page_present()?
I drop the !can_set_direct_map() condition in kernel_page_present(). And
test it. The test was passed. I use kretprobe to inspect in
kernel_page_present()
find that the NOMAP pages will reture false. So in saveable_page()
these page
will skip. The final logic for processing these pages is the same as
before (v5.4).
I think it is good way.
More information about the linux-arm-kernel
mailing list