[PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND
Oliver Upton
oupton at google.com
Thu Feb 24 12:05:59 PST 2022
On Thu, Feb 24, 2022 at 03:40:15PM +0000, Marc Zyngier wrote:
> On Wed, 23 Feb 2022 04:18:38 +0000,
> Oliver Upton <oupton at google.com> wrote:
> >
> > Add a new system event type, KVM_SYSTEM_EVENT_SUSPEND, which indicates
> > to userspace that the guest has requested the VM be suspended. Userspace
> > can decide whether or not it wants to honor the guest's request by
> > changing the MP state of the vCPU. If it does not, userspace is
> > responsible for configuring the vCPU to return an error to the guest.
> > Document these expectations in the KVM API documentation.
> >
> > To preserve ABI, this new exit requires explicit opt-in from userspace.
> > Add KVM_CAP_ARM_SYSTEM_SUSPEND which grants userspace the ability to
> > opt-in to these exits on a per-VM basis.
> >
> > Signed-off-by: Oliver Upton <oupton at google.com>
> > ---
> > Documentation/virt/kvm/api.rst | 39 +++++++++++++++++++++++++++++++
> > arch/arm64/include/asm/kvm_host.h | 3 +++
> > arch/arm64/kvm/arm.c | 5 ++++
> > arch/arm64/kvm/psci.c | 5 ++++
> > include/uapi/linux/kvm.h | 2 ++
> > 5 files changed, 54 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 2b4bdbc2dcc0..1e207bbc01f5 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -5930,6 +5930,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
> > #define KVM_SYSTEM_EVENT_RESET 2
> > #define KVM_SYSTEM_EVENT_CRASH 3
> > #define KVM_SYSTEM_EVENT_WAKEUP 4
> > + #define KVM_SYSTEM_EVENT_SUSPENDED 5
> > __u32 type;
> > __u64 flags;
> > } system_event;
> > @@ -5957,6 +5958,34 @@ Valid values for 'type' are:
> > - KVM_SYSTEM_EVENT_WAKEUP -- the guest is in a suspended state and KVM
> > has recognized a wakeup event. Userspace may honor this event by marking
> > the exiting vCPU as runnable, or deny it and call KVM_RUN again.
> > + - KVM_SYSTEM_EVENT_SUSPENDED -- the guest has requested a suspension of
> > + the VM.
> > +
> > +For arm/arm64:
> > +^^^^^^^^^^^^^^
> > +
> > + KVM_SYSTEM_EVENT_SUSPENDED exits are enabled with the
> > + KVM_CAP_ARM_SYSTEM_SUSPEND VM capability. If a guest successfully
> > + invokes the PSCI SYSTEM_SUSPEND function, KVM will exit to userspace
> > + with this event type.
> > +
> > + The guest's x2 register contains the 'entry_address' where execution
>
> x1?
>
> > + should resume when the VM is brought out of suspend. The guest's x3
>
> x2?
>
> > + register contains the 'context_id' corresponding to the request. When
> > + the guest resumes execution at 'entry_address', x0 should contain the
> > + 'context_id'. For more details on the SYSTEM_SUSPEND PSCI call, see
> > + ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND".
>
> I'd refrain from paraphrasing too much of the spec, and direct the
> user to it. It will also avoid introducing bugs... ;-)
>
> Overall, "the guest" is super ambiguous, and echoes the questions I
> had earlier about what this means for an SMP system. Only one vcpu can
> restart the system, but which one?
>
> > +
> > + Userspace is _required_ to take action for such an exit. It must
> > + either:
> > +
> > + - Honor the guest request to suspend the VM. Userspace must reset
> > + the calling vCPU, then set PC to 'entry_address' and x0 to
> > + 'context_id'. Userspace may request in-kernel emulation of the
> > + suspension by setting the vCPU's state to KVM_MP_STATE_SUSPENDED.
>
> So here, you are actively saying that the calling vcpu should be the
> one being resumed. If that's the case (and assuming that this is a
> behaviour intended by the spec), something should prevent the other
> vcpus from being started.
>
> > +
> > + - Deny the guest request to suspend the VM. Userspace must set
> > + registers x1-x3 to 0 and set x0 to PSCI_RET_INTERNAL_ERROR (-6).
>
> Do you have any sort of userspace code that demonstrates this? It'd be
> super useful to see how that works on any publicly available VMM
> (qemu, kvmtool, or any of the ferric oxide based monsters).
>
> >
> > ::
> >
> > @@ -7580,3 +7609,13 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
> > of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace
> > the hypercalls whose corresponding bit is in the argument, and return
> > ENOSYS for the others.
> > +
> > +8.35 KVM_CAP_ARM_SYSTEM_SUSPEND
> > +-------------------------------
> > +
> > +:Capability: KVM_CAP_ARM_SYSTEM_SUSPEND
> > +:Architectures: arm64
> > +:Type: vm
> > +
> > +When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
> > +type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index d32cab0c9752..e1c2ec18d1aa 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -146,6 +146,9 @@ struct kvm_arch {
> >
> > /* Memory Tagging Extension enabled for the guest */
> > bool mte_enabled;
> > +
> > + /* System Suspend Event exits enabled for the VM */
> > + bool system_suspend_exits;
>
> Gah... More of these. Please pick this patch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=kvm-arm64/mmu/guest-MMIO-guard&id=7dd0a13a4217b870f2e83cdc6045e5ce482a5340
>
> > };
> >
> > struct kvm_vcpu_fault_info {
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index d2b190f32651..ce3f14a77a49 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -101,6 +101,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > }
> > mutex_unlock(&kvm->lock);
> > break;
> > + case KVM_CAP_ARM_SYSTEM_SUSPEND:
> > + r = 0;
> > + kvm->arch.system_suspend_exits = true;
> > + break;
> > default:
> > r = -EINVAL;
> > break;
> > @@ -209,6 +213,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > case KVM_CAP_SET_GUEST_DEBUG:
> > case KVM_CAP_VCPU_ATTRIBUTES:
> > case KVM_CAP_PTP_KVM:
> > + case KVM_CAP_ARM_SYSTEM_SUSPEND:
> > r = 1;
> > break;
> > case KVM_CAP_SET_GUEST_DEBUG2:
> > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > index 2bb8d047cde4..a7de84cec2e4 100644
> > --- a/arch/arm64/kvm/psci.c
> > +++ b/arch/arm64/kvm/psci.c
> > @@ -245,6 +245,11 @@ static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > return 1;
> > }
> >
> > + if (kvm->arch.system_suspend_exits) {
> > + kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND);
> > + return 0;
> > + }
> > +
>
> So there really is a difference in behaviour here. Userspace sees the
> WFI behaviour before reset (it implements it), while when not using
> the SUSPEND event, reset occurs before anything else.
>
> They really should behave in a similar way (WFI first, reset next).
I mentioned this on the other patch, but I think the conversation should
continue here as UAPI context is in this one.
If SUSPEND exits are disabled and SYSTEM_SUSPEND is implemented in the
kernel, userspace cannot observe any intermediate state. I think it is
necessary for migration, otherwise if userspace were to save the vCPU
post-WFI, pre-reset the pending reset would get lost along the way.
As far as userspace is concerned, I think the WFI+reset operation is
atomic. SUSPEND exits just allow userspace to intervene before said
atomic operation.
Perhaps I'm missing something: assuming SUSPEND exits are disabled, what
value is provided to userspace if it can see WFI behavior before the
reset?
--
Thanks,
Oliver
More information about the kvm-riscv
mailing list