[RFC][PATCH] kvm: add suspend pm-notifier

Fri Jun 4 03:03:50 PDT 2021

On Fri, 04 Jun 2021 10:20:28 +0100,
Sergey Senozhatsky <senozhatsky at chromium.org> wrote:
> 
> On (21/06/04 09:46), Marc Zyngier wrote:
> [..]
> > > +void kvm_arch_pm_notifier(struct kvm *kvm)
> > > +{
> > > +#ifdef CONFIG_PM
> > > +	int c;
> > > +
> > > +	mutex_lock(&kvm->lock);
> > > +	for (c = 0; c < kvm->created_vcpus; c++) {
> > > +		struct kvm_vcpu *vcpu = kvm->vcpus[c];
> > > +		int r;
> > > +
> > > +		if (!vcpu)
> > > +			continue;
> > 
> > Wouldn't kvm_for_each_vcpu() avoid this kind of checks?
> 
> Right, that's what I do in v2, which I haven't posted yet.
> 
> [..]
> > > +#include <linux/notifier.h>
> > > +
> > >  #ifndef KVM_MAX_VCPU_ID
> > >  #define KVM_MAX_VCPU_ID KVM_MAX_VCPUS
> > >  #endif
> > > @@ -579,6 +581,10 @@ struct kvm {
> > >  	pid_t userspace_pid;
> > >  	unsigned int max_halt_poll_ns;
> > >  	u32 dirty_ring_size;
> > > +
> > > +#ifdef CONFIG_PM
> > > +	struct notifier_block pm_notifier;
> > > +#endif
> > 
> > I'd certainly like to be able to opt out from this on architectures
> > that do not implement anything useful in the PM callbacks.
> 
> Well on the other hand PM-callbacks are harmless on those archs, they
> won't overload the __weak function.

I don't care much for the callbacks. But struct kvm is bloated enough,
and I'd be happy not to have this structure embedded in it if I can
avoid it.

> 
> > Please consider making this an independent config option that individual
> > archs can buy into.
> 
> Sure, I can take a look into this, but how is this better than __weak
> function? (that's a real question)

Weak functions are OK AFAIC. More crud in struct kvm is what I'm not
OK with.

> 
> [..]
> > > +#ifdef CONFIG_PM
> > > +static int kvm_pm_notifier_call(struct notifier_block *bl,
> > > +				unsigned long state,
> > > +				void *unused)
> > > +{
> > > +	struct kvm *kvm = container_of(bl, struct kvm, pm_notifier);
> > > +
> > > +	switch (state) {
> > > +	case PM_HIBERNATION_PREPARE:
> > > +	case PM_SUSPEND_PREPARE:
> > > +		kvm_arch_pm_notifier(kvm);
> > 
> > How about passing the state to the notifier callback? I'd expect it to
> > be useful to do something on resume too.
> 
> For different states we can have different kvm_arch functions instead.
> kvm_arch_pm_notifier() can be renamed to kvm_arch_suspend_notifier(),
> so that we don't need to have `switch (state)` in every arch-code. Then
> for resume/post resume states we can have kvm_arch_resume_notifier()
> arch functions.

I'd rather we keep an arch API that is similar to the one the rest of
the kernel has, instead of a flurry of small helpers that need to grow
each time someone adds a new PM state. A switch() in the arch-specific
implementation is absolutely fine.

> 
> > > +		break;
> > > +	}
> > > +	return NOTIFY_DONE;
> > > +}
> > > +
> > > +static void kvm_init_pm_notifier(struct kvm *kvm)
> > > +{
> > > +	kvm->pm_notifier.notifier_call = kvm_pm_notifier_call;
> > > +	kvm->pm_notifier.priority = INT_MAX;
> > 
> > How is this priority determined?
> 
> Nothing magical here. I want this to be executed first, before we suspend
> ftrace, RCU and the like. Besides KVM is usually the last one to register
> its PM callbacks, so there can be something on the notifier list that
> returns NOTIFY_STOP_MASK in front of KVM PM-notifier list entry.

Which begs the question: should arch-specific code be able to veto
suspend and return an error itself? Always returning NOTIFY_DONE seems
highly optimistic.

	M.

-- 
Without deviation from the norm, progress is not possible.