qemu riscv, thead c906, Linux boot regression
Björn Töpel
bjorn at kernel.org
Wed Jan 24 05:27:10 PST 2024
Conor Dooley <conor at kernel.org> writes:
> On Wed, Jan 24, 2024 at 01:49:51PM +0100, Björn Töpel wrote:
>> Hi!
>>
>> I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that
>> thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2
>> ("target/riscv/cpu.c: restrict 'marchid' value")
>>
>> Reverting that commit, or the hack below solves the boot issue:
>>
>> --8<--
>> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
>> index 8cbfc7e781ad..e18596c8a55a 100644
>> --- a/target/riscv/cpu.c
>> +++ b/target/riscv/cpu.c
>> @@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj)
>> cpu->cfg.ext_xtheadsync = true;
>>
>> cpu->cfg.mvendorid = THEAD_VENDOR_ID;
>> + cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) |
>> + (QEMU_VERSION_MINOR << 8) |
>> + (QEMU_VERSION_MICRO));
>> #ifndef CONFIG_USER_ONLY
>> set_satp_mode_max_supported(cpu, VM_1_10_SV39);
>> #endif
>> --8<--
>>
>> I'm unsure what the correct qemu way of adding a default value is,
>> or if c906 should have a proper marchid.
>
> The "correct" marchid/mimpid values for the c906 are zero.
Ok! Thanks for clearing that up for me.
> I haven't looked into the code at all, so I am "assuming" that it is
> being zero intialised at present. Linux applies the errata fixups for
> the c906 when archid and impid are both zero - so your patch will avoid
> these fixups being applied.
I'm also assuming 0, -- will double-check. Hmm, that means that the
*previous* marchid was incorrect (pre d6a427e2c0b2).
> Do you think that perhaps the emulation in QEMU does not support what
> the kernel uses once then errata fixups are enabled?
Did a quick look at the c906 "in_asm,int" logs:
| 0x80201040: 12000073 sfence.vma zero,zero
| 0x80201044: 18051073 csrrw zero,satp,a0
|
| riscv_cpu_do_interrupt: hart:0, async:0, cause:000000000000000c, epc:0x0000000080201048, tval:0x0000000080201048, desc=exec_page_fault
| riscv_cpu_do_interrupt: hart:0, async:0, cause:000000000000000c, epc:0xffffffff80001048, tval:0xffffffff80001048, desc=exec_page_fault
| ...cont forever
So it looks like we're tripping over the page tables, when we're turning
on paging.
Hmm, maybe it's not qemu, but the c906 that has been broken for a while?
I'll disable it temporarily from CI anyhow, and will continue digging.
Thanks for the pointers/clarifications, Conor!
Björn
More information about the linux-riscv
mailing list