Bug report: kernel paniced while booting

Tue Jun 6 03:04:01 PDT 2023

在 2023/6/5 22:25, Alexandre Ghiti 写道:
> Hi Song,
> 
> On Mon, Jun 5, 2023 at 12:52 PM Song Shuai <songshuaishuai at tinylab.org> wrote:
>>
>> Description of problem:
>>
>> Booting Linux With RiscVVirtQemu edk2 firmware, a Store/AMO page fault was trapped to trigger a kernel panic.
>> The entire log has been posted at this link : https://termbin.com/nga4.
>>
>> You can reproduce it with the following step :
>>
>> 1. prepare the environment with
>>     - Qemu-virt:  v8.0.0 (with OpenSbi v1.2)
>>     - edk2 : at commit (2bc8545883 "UefiCpuPkg/CpuPageTableLib: Reduce the number of random tests")
>>     - Linux : v6.4-rc1 and later version
>>
>> 2. start the Qemu virt board
>>
>> ```sh
>> $ cat ~/8_riscv/start_latest.sh
>> #!/bin/bash
>> /home/song/8_riscv/3_acpi/qemu/ooo/usr/local/bin/qemu-system-riscv64 \
>>          -s -nographic -drive file=/home/song/8_riscv/3_acpi/Build_virt/RiscVVirtQemu/RELEASE_GCC5/FV/RISCV_VIRT.fd,if=pflash,format=raw,unit=1 \
>>          -machine virt,acpi=off -smp 2 -m 2G \
>>          -kernel /home/song/9_linux/linux/00_rv_def/arch/riscv/boot/Image \
>>          -initrd /home/song/8_riscv/3_acpi/buildroot/output/images/rootfs.ext2 \
>>          -append "root=/dev/ram ro console=ttyS0 earlycon=uart8250,mmio,0x10000000 efi=debug loglevel=8 memblock=debug" ## also panic by memtest
>> ```
>> 3. Then you will encounter the kernel panic logged in the above link
>>
>> Other Information:
>>
>> 1. -------
>>
>> This report is not identical to my prior report -- "kernel paniced when system hibernates" [1], but both of them
>> are closely related with the commit (3335068f8721 "riscv: Use PUD/P4D/PGD pages for the linear mapping").
>>
>> With this commit, hibernation is trapped with "access fault" while accessing the PMP-protected regions (mmode_resv0 at 80000000)
>> from OpenSbi (BTW, hibernation is marked as nonportable by Conor[2]).
>>
>> In this report, efi_init handoffs the memory mapping from Boot Services to memblock where reserves mmode_resv0 at 80000000,
>> so there is no "access fault" but "page fault".
>>
>> And reverting commit 3335068f8721 indeed fixed this panic.
>>
>> 2. -------
>>
>> As the gdb-pt-dump [3] tool shows, the PTE which covered the fault virtual address had the appropriate permission to store.
>> Is there another way to trigger the "Store/AMO page fault"? Or the creation of linear mapping in commit 3335068f8721 did something wrong?
>>
>> ```
>> (gdb) p/x $satp
>> $1 = 0xa000000000081708
>> (gdb) pt -satp 0xa000000000081708
>>               Address :     Length   Permissions
>>    0xff1bfffffea39000 :     0x1000 | W:1 X:0 R:1 S:1
>>    0xff1bfffffebf9000 :     0x1000 | W:1 X:0 R:1 S:1
>>    0xff1bfffffec00000 :   0x400000 | W:1 X:0 R:1 S:1
>>    0xff60000000000000 :   0x1c0000 | W:1 X:0 R:1 S:1
>>    0xff60000000200000 :   0xa00000 | W:0 X:0 R:1 S:1
>>    0xff60000000c00000 : 0x7f000000 | W:1 X:0 R:1 S:1  // badaddr: ff6000007fdb1000
>>    0xff6000007fdc0000 :    0x3d000 | W:1 X:0 R:1 S:1
>>    0xff6000007ffbf000 :     0x1000 | W:1 X:0 R:1 S:1
>>    0xffffffff80000000 :   0xc00000 | W:0 X:1 R:1 S:1
>>    0xffffffff80c00000 :   0xa00000 | W:1 X:0 R:1 S:1
>>
>> ```
>>
>> 3. ------
>>
>> You can also reproduce similar panic by appending "memtest" in kernel cmdline.
>> I have posted the memtest boot log at this link: https://termbin.com/1twl.
>>
>> Please correct me if I'm wrong.
>>
>> [1]: https://lore.kernel.org/linux-riscv/CAAYs2=gQvkhTeioMmqRDVGjdtNF_vhB+vm_1dHJxPNi75YDQ_Q@mail.gmail.com/
>> [2]: https://lore.kernel.org/linux-riscv/20230526-astride-detonator-9ae120051159@wendy/
>> [3]: https://github.com/martinradev/gdb-pt-dump
> 
> Thanks for the thorough report, really appreciated.
> 
> So there are multiple issues here:
> 
> - the first one is that the memory region for opensbi is marked as not
> cacheable in the efi memory map, and then this region is not mapped in
> the linear mapping:
> [    0.000000] efi:   0x000080000000-0x00008003ffff [Reserved    |   |
>   |  |  |  |  |  |  |  |   |  |  |  |UC]
> 
> - the second one (that I feel a bit ashamed of...) is that I did not
> check the alignment of the virtual address when choosing the map size
> in best_map_size() and then we end up trying to map a physical region
> aligned on 2MB that is actually not aligned on 2MB virtually because
> the opensbi region is not mapped at all.
> 
The issue 2 should be the root cause of this panic.

Here is my understanding of the necessity of the 2M-aligned VA for 
linear PMD mapping. Please correct me if I'm wrong.

I logged the `create_linear_mapping_range()` function.

```
song # lowmem region: [0x0000000081800000 -- 0x00000000ffe3d000], va: 
0xff6000007fbc0000, pa: 0x00000000ffc00000, map_size: 200000 ,pg: e7
song # lowmem region: [0x0000000081800000 -- 0x00000000ffe3d000], va: 
0xff6000007fdc0000, pa: 0x00000000ffe00000, map_size: 1000 ,pg: e7
```

The PA `0x00000000ffc00000` of this PMD mapping is aligned with PMD_SIZE 
but VA `0xff6000007fbc0000` is not.
After the `pmd_index()`, this 2M PA region is actually mapping the 
effective VA region `[0xff6000007fa00000,0xff6000007fc00000)`,
and any access of VA hole between the end of the effective VA region and 
the start VA of the next 4K mapping (`0xff6000007fdc0000`) will fault.

In this report, the memtest fault VA (`0xff6000007fc00000`) and the 
booting fault VA (`ff6000007fdb1000`) lie right in the VA hole.

When reverting the commit 3335068f8721, the kernel load address is 
always offseted by PMD_SIZE, kernel_map.va_pa_offset and
MIN_MEMBLOCK_ADDR follow it. So the linear PMD mapping will always take 
2M-aligned VA. That's why this reverting works.

> - the possible third one is that we should not map the linear mapping
> using 4K pages, this would be slow in my opinion, and I think we
> should waste a bit of memory to align va and pa on a 2MB boundary.
I also noticed this one.
> 
> So I'll fix the second issue, and possibly the third one, and if no
Thanks for your attention to this report, looking for your fixup.
> one looks into why the opensbi region is mapped in UC, I'll take a
> look at edk2.
> 
> Sorry for that,
> 
> Alex
> 

-- 
Song Shuai
Thanks