Bug report: kernel paniced when system hibernates

JeeHeng Sia jeeheng.sia at starfivetech.com
Tue May 16 02:55:23 PDT 2023


Hi Song,

Thanks for the investigation. Indeed, the exposure of the PMP reserved region to the kernel page table is causing the problem.
Here is the similar report: https://groups.google.com/u/0/a/groups.riscv.org/g/sw-dev/c/ITXwaKfA6z8

Thanks
Regards
Jee Heng

> -----Original Message-----
> From: Song Shuai <suagrfillet at gmail.com>
> Sent: Tuesday, May 16, 2023 5:24 PM
> To: alexghiti at rivosinc.com; robh at kernel.org; Andrew Jones <ajones at ventanamicro.com>; anup at brainfault.org;
> palmer at rivosinc.com; JeeHeng Sia <jeeheng.sia at starfivetech.com>; Leyfoon Tan <leyfoon.tan at starfivetech.com>; Mason Huo
> <mason.huo at starfivetech.com>; Paul Walmsley <paul.walmsley at sifive.com>; Conor Dooley <conor.dooley at microchip.com>; Guo
> Ren <guoren at kernel.org>
> Cc: linux-riscv at lists.infradead.org; linux-kernel at vger.kernel.org
> Subject: Bug report: kernel paniced when system hibernates
> 
> Description of problem:
> 
> The latest hibernation support[1] of RISC-V Linux produced a kernel panic.
> The entire log has been posted at this link: https://termbin.com/sphl .
> 
> How reproducible:
> 
> You can reproduce it with the following step :
> 
> 1. prepare the environment with
> - Qemu-virt v8.0.0 (with OpenSbi v1.2)
> - Linux v6.4-rc1
> 
> 2. start the Qemu virt
> ```sh
> $ cat ~/8_riscv/start_latest.sh
> #!/bin/bash
> /home/song/8_riscv/3_acpi/qemu/ooo/usr/local/bin/qemu-system-riscv64 \
> -smp 2 -m 4G -nographic -machine virt \
> -kernel /home/song/9_linux/linux/00_rv_test/arch/riscv/boot/Image \
> -append "root=/dev/vda ro eaylycon=uart8250,mmio,0x10000000
> early_ioremap_debug console=ttyS0 loglevel=8 memblock=debug
> no_console_suspend audit=0 3" \
> -drive file=/home/song/8_riscv/fedora/stage4-disk.img,format=raw,id=hd0 \
> -device virtio-blk-device,drive=hd0 \
> -drive file=/home/song/8_riscv/fedora/adisk.qcow2,format=qcow2,id=hd1 \
> -device virtio-blk-device,drive=hd1 \
> -gdb tcp::1236 #-S
> ```
> 3. execute hibernation
> 
> ```sh
> swapon /dev/vdb2 # this is my swap disk
> 
> echo disk > /sys/power/state
> ```
> 
> 4. Then you will encounter the kernel panic logged in the above link
> 
> 
> Other Information:
> 
> After my initial and incomplete dig-up, the commit (3335068f8721
> "riscv: Use PUD/P4D/PGD pages for the linear mapping")[2]
> is closely related to this panic. This commit uses re-defined
> `MIN_MEMBLOCK_ADDR` to discover the entire system memory
> and extends the `va_pa_offset` from `kernel_map.phys_addr` to
> `phys_ram_base` for linear memory mapping.
> 
> If the firmware delivered the firmware memory region (like: a PMP
> protected region in OpenSbi) without "no-map" propriety,
> this commit will result in firmware memory being directly mapped by
> `create_linear_mapping_page_table()`.
> 
> We can see the mapping via ptdump :
> ```c
> ---[ Linear mapping ]---
> 0xff60000000000000-0xff60000000200000 0x0000000080000000 2M PMD D A G
> . . W R V ------------- the firmware memory
> 0xff60000000200000-0xff60000000c00000 0x0000000080200000 10M PMD D A G . . . R V
> 0xff60000000c00000-0xff60000001000000 0x0000000080c00000 4M PMD D A G . . W R V
> 0xff60000001000000-0xff60000001600000 0x0000000081000000 6M PMD D A G . . . R V
> 0xff60000001600000-0xff60000040000000 0x0000000081600000 1002M PMD D A
> G . . W R V
> 0xff60000040000000-0xff60000100000000 0x00000000c0000000 3G PUD D A G . . W R V
> ---[ Modules/BPF mapping ]---
> ---[ Kernel mapping ]---
> 0xffffffff80000000-0xffffffff80a00000 0x0000000080200000 10M PMD D A G . X . R V
> 0xffffffff80a00000-0xffffffff80c00000 0x0000000080c00000 2M PMD D A G . . . R V
> 0xffffffff80c00000-0xffffffff80e00000 0x0000000080e00000 2M PMD D A G . . W R V
> 0xffffffff80e00000-0xffffffff81400000 0x0000000081000000 6M PMD D A G . . . R V
> 0xffffffff81400000-0xffffffff81800000 0x0000000081600000 4M PMD
> ```
> 
> In the hibernation process, `swsusp_save()` calls
> `copy_data_pages(&copy_bm, &orig_bm)` to copy these two memory
> bitmaps,
> the Oops(load access fault) occurred while copying the page of
> PAGE_OFFSET (which maps the firmware memory).
> 
> I also did two other tests:
> Test1:
> 
> The hibernation works well in the kernel with the commit 3335068f8721
> reverted at least in the current environment.
> 
> Test2:
> 
> I built a simple kernel module to simulate the access of the value of
> `PAGE_OFFSET` address, and the same panic occurred with the load
> access fault.
> So hibernation seems not the only case to trigger this panic.
> 
> Finally, should we always leave the firmware memory with
> `MEMBLOCK_NOMAP` flag by some efforts from Linux or OpenSbi (at least
> in the current environment) or any other suggestions?
> 
> Please correct me if I'm wrong.
> 
> [1]: https://lore.kernel.org/r/20230330064321.1008373-5-jeeheng.sia@starfivetech.com
> [2]: https://lore.kernel.org/r/20230324155421.271544-4-alexghiti@rivosinc.com
> 
> --
> Thanks,
> Song


More information about the linux-riscv mailing list