[PATCH v3 09/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND

Thu Feb 24 11:35:33 PST 2022

Hi Marc,

Thanks for reviewing the series. ACK to the nits and smaller comments
you've made, I'll incorporate that feedback in the next series.

On Thu, Feb 24, 2022 at 02:02:34PM +0000, Marc Zyngier wrote:
> On Wed, 23 Feb 2022 04:18:34 +0000,
> Oliver Upton <oupton at google.com> wrote:
> > 
> > ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
> > software to request that a system be placed in the deepest possible
> > low-power state. Effectively, software can use this to suspend itself to
> > RAM. Note that the semantics of this PSCI call are very similar to
> > CPU_SUSPEND, which is already implemented in KVM.
> > 
> > Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the
> > low-power state is implemented as a guest WFI. Synchronously reset the
> > calling CPU before entering the WFI, such that the vCPU may immediately
> > resume execution when a wakeup event is recognized.
> > 
> > Signed-off-by: Oliver Upton <oupton at google.com>
> > ---
> >  arch/arm64/kvm/psci.c  | 51 ++++++++++++++++++++++++++++++++++++++++++
> >  arch/arm64/kvm/reset.c |  3 ++-
> >  2 files changed, 53 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > index 77a00913cdfd..41adaaf2234a 100644
> > --- a/arch/arm64/kvm/psci.c
> > +++ b/arch/arm64/kvm/psci.c
> > @@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
> >  	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET);
> >  }
> >  
> > +static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > +{
> > +	struct vcpu_reset_state reset_state;
> > +	struct kvm *kvm = vcpu->kvm;
> > +	struct kvm_vcpu *tmp;
> > +	bool denied = false;
> > +	unsigned long i;
> > +
> > +	reset_state.pc = smccc_get_arg1(vcpu);
> > +	if (!kvm_ipa_valid(kvm, reset_state.pc)) {
> > +		smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0);
> > +		return 1;
> > +	}
> > +
> > +	reset_state.r0 = smccc_get_arg2(vcpu);
> > +	reset_state.be = kvm_vcpu_is_be(vcpu);
> > +	reset_state.reset = true;
> > +
> > +	/*
> > +	 * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the
> > +	 * calling vCPU) be in an OFF state, as determined by the
> > +	 * implementation.
> > +	 *
> > +	 * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details.
> > +	 */
> > +	mutex_lock(&kvm->lock);
> > +	kvm_for_each_vcpu(i, tmp, kvm) {
> > +		if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) {
> > +			denied = true;
> > +			break;
> > +		}
> > +	}
> > +	mutex_unlock(&kvm->lock);
> 
> This looks dodgy. Nothing seems to prevent userspace from setting the
> mp_state to RUNNING in parallel with this, as only the vcpu mutex is
> held when this ioctl is issued.
> 
> It looks to me that what you want is what lock_all_vcpus() does
> (Alexandru has a patch moving it out of the vgic code as part of his
> SPE series).
> 
> It is also pretty unclear what the interaction with userspace is once
> you have released the lock. If the VMM starts a vcpu other than the
> suspending one, what is its state? The spec doesn't see to help
> here. I can see two options:
> 
> - either all the vcpus have the same reset state applied to them as
>   they come up, unless they are started with CPU_ON by a vcpu that has
>   already booted (but there is a single 'context_id' provided, and I
>   fear this is going to confuse the OS)...
> 
> - or only the suspending vcpu can resume the system, and we must fail
>   a change of mp_state for the other vcpus.
> 
> What do you think?

Definitely the latter. The documentation of SYSTEM_SUSPEND is quite
shaky on this, but it would appear that the intention is for the caller
to be the first CPU to wake up.

> > +
> > +	if (denied) {
> > +		smccc_set_retval(vcpu, PSCI_RET_DENIED, 0, 0, 0);
> > +		return 1;
> > +	}
> > +
> > +	__kvm_reset_vcpu(vcpu, &reset_state);
> > +	kvm_vcpu_wfi(vcpu);
> 
> I have mixed feelings about this. The vcpu has reset before being in
> WFI, while it really should be the other way around and userspace
> could rely on observing the transition.
> 
> What breaks if you change this?

I don't think that userspace would be able to observe the transition
even if we WFI before the reset. I imagine that would take the form
of setting KVM_REQ_VCPU_RESET, which we explicitly handle before
letting userspace access the vCPU's state as of commit
6826c6849b46 ("KVM: arm64: Handle PSCI resets before userspace
touches vCPU state").

Given this, I felt it was probably best to avoid all the indirection and
just do the vCPU reset in the handling of SYSTEM_SUSPEND. It does,
however, imply that we have slightly different behavior when userspace
exits are enabled, as that will happen pre-reset and pre-WFI.

--
Oliver