qemu riscv, thead c906, Linux boot regression
Conor Dooley
conor at kernel.org
Wed Jan 24 05:49:12 PST 2024
On Wed, Jan 24, 2024 at 02:27:10PM +0100, Björn Töpel wrote:
> Conor Dooley <conor at kernel.org> writes:
>
> > On Wed, Jan 24, 2024 at 01:49:51PM +0100, Björn Töpel wrote:
> >> Hi!
> >>
> >> I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that
> >> thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2
> >> ("target/riscv/cpu.c: restrict 'marchid' value")
> >>
> >> Reverting that commit, or the hack below solves the boot issue:
> >>
> >> --8<--
> >> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> >> index 8cbfc7e781ad..e18596c8a55a 100644
> >> --- a/target/riscv/cpu.c
> >> +++ b/target/riscv/cpu.c
> >> @@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj)
> >> cpu->cfg.ext_xtheadsync = true;
> >>
> >> cpu->cfg.mvendorid = THEAD_VENDOR_ID;
> >> + cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) |
> >> + (QEMU_VERSION_MINOR << 8) |
> >> + (QEMU_VERSION_MICRO));
> >> #ifndef CONFIG_USER_ONLY
> >> set_satp_mode_max_supported(cpu, VM_1_10_SV39);
> >> #endif
> >> --8<--
> >>
> >> I'm unsure what the correct qemu way of adding a default value is,
> >> or if c906 should have a proper marchid.
> >
> > The "correct" marchid/mimpid values for the c906 are zero.
>
> Ok! Thanks for clearing that up for me.
>
> > I haven't looked into the code at all, so I am "assuming" that it is
> > being zero intialised at present. Linux applies the errata fixups for
> > the c906 when archid and impid are both zero - so your patch will avoid
> > these fixups being applied.
>
> I'm also assuming 0, -- will double-check. Hmm, that means that the
> *previous* marchid was incorrect (pre d6a427e2c0b2).
>
> > Do you think that perhaps the emulation in QEMU does not support what
> > the kernel uses once then errata fixups are enabled?
>
> Did a quick look at the c906 "in_asm,int" logs:
>
> | 0x80201040: 12000073 sfence.vma zero,zero
> | 0x80201044: 18051073 csrrw zero,satp,a0
> |
> | riscv_cpu_do_interrupt: hart:0, async:0, cause:000000000000000c, epc:0x0000000080201048, tval:0x0000000080201048, desc=exec_page_fault
> | riscv_cpu_do_interrupt: hart:0, async:0, cause:000000000000000c, epc:0xffffffff80001048, tval:0xffffffff80001048, desc=exec_page_fault
> | ...cont forever
>
> So it looks like we're tripping over the page tables, when we're turning
> on paging.
>
> Hmm, maybe it's not qemu, but the c906 that has been broken for a while?
I didn't know what you mean by "not qemu, but the c906", so I went and
boot tested my d1 nezha. On today's next (6.8.0-rc1-next-20240124) it
booted into my initramfs with no problems. Obivously though my config is
unlikely to match yours, but that seems like a core thing that should be
hit regardless of config.
So perhaps this is a c906-in-QEMU problem? Lacking emulation for
something the kernel uses perhaps? I know nothing about the capabilities
of its emulation in QEMU, so I am of no help.
Cheers,
Conor.
>
> I'll disable it temporarily from CI anyhow, and will continue digging.
>
>
> Thanks for the pointers/clarifications, Conor!
> Björn
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-riscv/attachments/20240124/08e89d1d/attachment.sig>
More information about the linux-riscv
mailing list