[PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes

Fri May 23 01:08:26 PDT 2025

On Fri, May 23, 2025 at 12:47 PM Radim Krčmář <rkrcmar at ventanamicro.com> wrote:
>
> 2025-05-22T14:43:40-07:00, Atish Patra <atish.patra at linux.dev>:
> > On 5/15/25 7:37 AM, Radim KrÄmÃ¡Å wrote:
> >> Hello,
> >>
> >> the design still requires a discussion.
> >>
> >> [v3 1/2] removes most of the additional changes that the KVM capability
> >> was doing in v2.  [v3 2/2] is new and previews a general solution to the
> >> lack of userspace control over KVM SBI.
> >>
> >
> > I am still missing the motivation behind it. If the motivation is SBI
> > HSM suspend, the PATCH2 doesn't achieve that as it forwards every call
> > to the user space. Why do you want to control hsm start/stop from the
> > user space ?
>
> HSM needs fixing, because KVM doesn't know what the state after
> sbi_hart_start should be.
> For example, we had a discussion about scounteren and regardless of what
> default we choose in KVM, the userspace might want a different value.
> I don't think that HSM start/stop is a hot path, so trapping to
> userspace seems better than adding more kernel code.

There are no implementation specific S-mode CSR reset values
required at the moment. Whenever the need arises, we will extend
the ONE_REG interface so that user space can specify custom
CSR reset values at Guest/VM creation time. We don't need to
forward SBI HSM calls to user space for custom S-mode CSR
reset values.

>
> Forwarding all the unimplemented SBI ecalls shouldn't be a performance
> issue, because S-mode software would hopefully learn after the first
> error and stop trying again.
>
> Allowing userspace to fully implement the ecall instruction one of the
> motivations as well -- SBI is not a part of RISC-V ISA, so someone might
> be interested in accelerating a different M-mode software with KVM.
>
> I'll send v4 later today -- there is a missing part in [2/2], because
> userspace also needs to be able to emulate the base SBI extension.
>

Emulating entire SBI in user space has may challenges, here
are few:

1) SBI IPI in userspace will require an ioctl to trigger VCPU local
interrupt which does not exist. We only have KVM ioctls to trigger
external interrupts and MSIs.

2) SBI RFENCE in userspace will requires HFENCE operation in
user space which is not allowed by RISC-V ISA.

3) SBI PMU uses Linux perf framework APIs to share counters
between host and guest. The Linux perf APIs for guest perf events
are not available to userspace as syscall or ioctl.

4) SBI STA uses sched_info.run_delay which I am sure is not
available to user space.

5) SBI NACL when implemented will be using tons of HS-mode
functionality (HS-mode CSRs, HFENCEs, etc.) to achieve the
nested world-switch and none of these are accessible to userspace.

6) SBI FWFT may require programming hstateenX CSRs which
are not accessible to userspace.

7) SBI DBTR requires direct coordination between the KVM RISC-V
and kernel hw_breakpoint driver to share the debug triggers.

... and so on ...

Based on the above, emulating the entire SBI in user space is
a non-starter. The best approach is to selectively forward SBI
calls to user space where needed (e.g. SBI system reset,
SBI system suspend, SBI debug console, etc.).

Regards,
Anup