[PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND

Oliver Upton oupton at google.com
Wed Mar 2 01:57:08 PST 2022


On Wed, Mar 2, 2022 at 1:52 AM Marc Zyngier <maz at kernel.org> wrote:
>
> On Sat, 26 Feb 2022 18:28:21 +0000,
> Oliver Upton <oupton at google.com> wrote:
> >
> > On Sat, Feb 26, 2022 at 3:29 AM Marc Zyngier <maz at kernel.org> wrote:
> > >
> > > On Thu, 24 Feb 2022 20:05:59 +0000,
> > > Oliver Upton <oupton at google.com> wrote:
> > > >
> > > > On Thu, Feb 24, 2022 at 03:40:15PM +0000, Marc Zyngier wrote:
> > > > > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > > > > index 2bb8d047cde4..a7de84cec2e4 100644
> > > > > > --- a/arch/arm64/kvm/psci.c
> > > > > > +++ b/arch/arm64/kvm/psci.c
> > > > > > @@ -245,6 +245,11 @@ static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > > > > >           return 1;
> > > > > >   }
> > > > > >
> > > > > > + if (kvm->arch.system_suspend_exits) {
> > > > > > +         kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND);
> > > > > > +         return 0;
> > > > > > + }
> > > > > > +
> > > > >
> > > > > So there really is a difference in behaviour here. Userspace sees the
> > > > > WFI behaviour before reset (it implements it), while when not using
> > > > > the SUSPEND event, reset occurs before anything else.
> > > > >
> > > > > They really should behave in a similar way (WFI first, reset next).
> > > >
> > > > I mentioned this on the other patch, but I think the conversation should
> > > > continue here as UAPI context is in this one.
> > > >
> > > > If SUSPEND exits are disabled and SYSTEM_SUSPEND is implemented in the
> > > > kernel, userspace cannot observe any intermediate state. I think it is
> > > > necessary for migration, otherwise if userspace were to save the vCPU
> > > > post-WFI, pre-reset the pending reset would get lost along the way.
> > > >
> > > > As far as userspace is concerned, I think the WFI+reset operation is
> > > > atomic. SUSPEND exits just allow userspace to intervene before said
> > > > atomic operation.
> > > >
> > > > Perhaps I'm missing something: assuming SUSPEND exits are disabled, what
> > > > value is provided to userspace if it can see WFI behavior before the
> > > > reset?
> > >
> > > Signals get in the way, and break the notion of atomicity. Userspace
> > > *will* observe this.
> > >
> > > I agree that save/restore is an important point, and that snapshoting
> > > the guest at this stage should capture the reset value. But it is the
> > > asymmetry of the behaviours that I find jarring:
> > >
> > > - if you ask for userspace exit, no reset value is applied and you
> > >   need to implement the reset in userspace
> > >
> > > - if you *don't* ask for a userspace exit, the reset values are
> > >   applied, and a signal while in WFI will result in this reset being
> > >   observed
> > >
> > > Why can't the userspace exit path also apply the reset values *before*
> > > exiting? After all, you can model this exit to userspace as
> > > reset+WFI+'spurious exit from WFI'. This would at least unify the two
> > > behaviours.
> >
> > I hesitated applying the reset context to the CPU before the userspace
> > exit because that would be wildly different from the other system
> > events. Userspace wouldn't have much choice but to comply with the
> > guest request at that point.
> >
> > What about adopting the following:
> >
> >  - Drop the in-kernel SYSTEM_SUSPEND emulation. I think you were
> > getting at this point in [1], and I'd certainly be open to it. Without
> > a userspace exit, I don't think there is anything meaningfully
> > different between this call and a WFI instruction.
>
> The only difference is the reset part. And I agree, it only makes the
> kernel part more complicated than we strictly need it to be. It also
> slightly clashes with the rest of the system events, in the sense that
> it is the only one that would have an in-kernel implementation (both
> reboot and power-off are entirely implemented in userspace).
>
> So I definitely agree about dropping this.
>
> >
> >  - Add data to the kvm_run structure to convey the reset state for a
> > SYSTEM_SUSPEND exit. There's plenty of room left in the structure for
> > more, and can be done generically (just an array of data) for future
> > expansion. We already are going to need a code change in userspace to
> > do this right, so may as well update its view of kvm_run along the
> > way.
>
> The reset state is already available in the guest registers, which are
> available to userspace. What else do we need to expose?

Nothing. It is just a slight nitnoid thing for me where
KVM_EXIT_SYSTEM_SUSPEND behaves a bit differently than the others. If
a VMM wants to reject the call, it needs to manually set up the SMCCC
return value, whereas on the others a naive call to KVM_RUN will do
the job since KVM already sets up the failure value.

Unsure if this warrants a kvm_run change, leaning towards no if it is
well documented.

> >  - Exit to userspace with PSCI_RET_INTERNAL_FAILURE queued up for the
> > guest. Doing so keeps the exits consistent with the other system
> > exits, and affords userspace the ability to deny the call when it
> > wants to.
>
> Yup, that's what I like about pushing this completely to userspace.
>
> >
> > [1]: http://lore.kernel.org/r/87fso63ha2.wl-maz@kernel.org
> >
> > > I still dislike the reset state being applied early, but consistency
> > > (and save/restore) trumps taste here. I know I'm being pedantic here,
> > > but we've been burned with loosely defined semantics in the past, and
> > > I want to get this right. Or less wrong.
> >
> > I completely agree with you. The semantics are a bit funky, and I
> > really do wonder if the easiest way around that is to just make the
> > implementation a userspace problem.
>
> We're in violent agreement.

Lol

> It means that we only need the MP_STATE
> part to implement WFI from userspace.
>
> Could you try and respin this? Also, it'd be good to see a prototype
> of userspace code using this, as this is a new API.

Sure thing. I'll keep it to kvmtool, since that's the most familiar to
me. Also, I think I had an RFC for kvmtool many moons ago...

--
Oliver



More information about the kvm-riscv mailing list