[PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes
Anup Patel
apatel at ventanamicro.com
Sat May 24 02:59:24 PDT 2025
On Fri, May 23, 2025 at 2:50 PM Radim Krčmář <rkrcmar at ventanamicro.com> wrote:
>
> 2025-05-23T13:38:26+05:30, Anup Patel <apatel at ventanamicro.com>:
> > On Fri, May 23, 2025 at 12:47 PM Radim Krčmář <rkrcmar at ventanamicro.com> wrote:
> >>
> >> 2025-05-22T14:43:40-07:00, Atish Patra <atish.patra at linux.dev>:
> >> > On 5/15/25 7:37 AM, Radim KrÄmáŠwrote:
> >> >> Hello,
> >> >>
> >> >> the design still requires a discussion.
> >> >>
> >> >> [v3 1/2] removes most of the additional changes that the KVM capability
> >> >> was doing in v2. [v3 2/2] is new and previews a general solution to the
> >> >> lack of userspace control over KVM SBI.
> >> >>
> >> >
> >> > I am still missing the motivation behind it. If the motivation is SBI
> >> > HSM suspend, the PATCH2 doesn't achieve that as it forwards every call
> >> > to the user space. Why do you want to control hsm start/stop from the
> >> > user space ?
> >>
> >> HSM needs fixing, because KVM doesn't know what the state after
> >> sbi_hart_start should be.
> >> For example, we had a discussion about scounteren and regardless of what
> >> default we choose in KVM, the userspace might want a different value.
> >> I don't think that HSM start/stop is a hot path, so trapping to
> >> userspace seems better than adding more kernel code.
> >
> > There are no implementation specific S-mode CSR reset values
> > required at the moment.
>
> Jessica mentioned that BSD requires scounteren to be non-zero, so
> userspace should be able to provide that value.
>
> I would prefer if KVM could avoid getting into those discussions.
> We can just just let userspace be as crazy as it wants.
The supervisor OS must not expect a particular state of S-mode
CSRs other than what is defined in the boot protocol or the SBI
specification.
Like mentioned before, scounteren setup in KVM RISC-V and
OpenSBI is a HACK for buggy OSes which don't set up scounteren
CSR correctly when a HART comes-up. Even KVM user space
should not entertain such HACKs.
>
> > Whenever the need arises, we will extend
> > the ONE_REG interface so that user space can specify custom
> > CSR reset values at Guest/VM creation time. We don't need to
> > forward SBI HSM calls to user space for custom S-mode CSR
> > reset values.
>
> The benefits of adding a new ONE_REG interface seem very small compared
> to the drawbacks of having extra kernel code.
Forwarding HSM stop to userspace will slow down CPU hotplug
on Guest side. Further, this directly impacts SBI system suspend
performance for Guest because Guest is supposed to turn-off all
VCPUs except one before entering the SBI system suspend state.
>
> If userspace would want to reset or setup new multi-VCPUs VMs often, we
> could add an interface that loads the whole register state from
> userspace in a single IOCTL, because ONE_REG is not the best interface
> for bulk data transfer either.
Instead of inventing a new interface, we can simply improve the
ONE_REG interface to allow bulk read/write of multiple ONE_REG
registers which will benefit other architectures as well.
If required in the future, this bulk ONE_REG read/write interface
can also be used to load reset state of VCPU CSRs.
>
> >> Forwarding all the unimplemented SBI ecalls shouldn't be a performance
> >> issue, because S-mode software would hopefully learn after the first
> >> error and stop trying again.
> >>
> >> Allowing userspace to fully implement the ecall instruction one of the
> >> motivations as well -- SBI is not a part of RISC-V ISA, so someone might
> >> be interested in accelerating a different M-mode software with KVM.
> >>
> >> I'll send v4 later today -- there is a missing part in [2/2], because
> >> userspace also needs to be able to emulate the base SBI extension.
> >>
> >
> > [...] The best approach is to selectively forward SBI
> > calls to user space where needed (e.g. SBI system reset,
> > SBI system suspend, SBI debug console, etc.).
>
> That is exactly what my proposal does, it's just that the userspace says
> what is "needed".
Nope, the approach taken by your patch is problematic because
for example userspace might disable SBI RFENCE or SBI PMU
with no means to implement these SBI extensions in user space.
We can't blindly forward an SBI extension to userspace when
userspace lacks the capability to implement this extension.
>
> If we started with this mechanism, KVM would not have needed to add
> SRST/SUSP/DBCN SBI emulation at all -- they would be forwarded as any
> other unhandled ecall.
SBI SRST extension is implemented in kernel space because
we are re-using the existing KVM_EXIT_SYSTEM_EVENT so
that we can also re-use existing KVM_EXIT_SYSTEM_EVENT
related code on userspace side.
SBI SUSP and DBCN are already forward to user space and
we only have a minimal code in kernel space to ensure that:
1) In-kernel SBI BASE extension is aware of these extensions
2) These are forwarded to userspace only when userspace
enables these extensions.
In addition to the above, we are blindly forwarding SBI
experimental and vendor extensions to user space so
user space can do its own thing by implementing these
extensions.
Regards,
Anup
More information about the kvm-riscv
mailing list