[PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK

Marcelo Tosatti mtosatti at redhat.com
Thu Sep 30 12:21:07 PDT 2021


On Wed, Sep 29, 2021 at 03:56:29PM -0300, Marcelo Tosatti wrote:
> Oliver,
> 
> Do you have any numbers for the improvement in guests CLOCK_REALTIME
> accuracy across migration, when this is in place?
> 
> On Thu, Sep 16, 2021 at 06:15:35PM +0000, Oliver Upton wrote:
> > Handling the migration of TSCs correctly is difficult, in part because
> > Linux does not provide userspace with the ability to retrieve a (TSC,
> > realtime) clock pair for a single instant in time. In lieu of a more
> > convenient facility, KVM can report similar information in the kvm_clock
> > structure.
> > 
> > Provide userspace with a host TSC & realtime pair iff the realtime clock
> > is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
> > realtime value, advance the KVM clock by the amount of elapsed time. Do
> > not step the KVM clock backwards, though, as it is a monotonic
> > oscillator.
> > 
> > Suggested-by: Paolo Bonzini <pbonzini at redhat.com>
> > Signed-off-by: Oliver Upton <oupton at google.com>
> > ---
> >  Documentation/virt/kvm/api.rst  | 42 ++++++++++++++++++++++++++-------
> >  arch/x86/include/asm/kvm_host.h |  3 +++
> >  arch/x86/kvm/x86.c              | 36 +++++++++++++++++++++-------
> >  include/uapi/linux/kvm.h        |  7 +++++-
> >  4 files changed, 70 insertions(+), 18 deletions(-)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index a6729c8cf063..d0b9c986cf6c 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -993,20 +993,34 @@ such as migration.
> >  When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
> >  set of bits that KVM can return in struct kvm_clock_data's flag member.
> >  
> > -The only flag defined now is KVM_CLOCK_TSC_STABLE.  If set, the returned
> > -value is the exact kvmclock value seen by all VCPUs at the instant
> > -when KVM_GET_CLOCK was called.  If clear, the returned value is simply
> > -CLOCK_MONOTONIC plus a constant offset; the offset can be modified
> > -with KVM_SET_CLOCK.  KVM will try to make all VCPUs follow this clock,
> > -but the exact value read by each VCPU could differ, because the host
> > -TSC is not stable.
> > +FLAGS:
> > +
> > +KVM_CLOCK_TSC_STABLE.  If set, the returned value is the exact kvmclock
> > +value seen by all VCPUs at the instant when KVM_GET_CLOCK was called.
> > +If clear, the returned value is simply CLOCK_MONOTONIC plus a constant
> > +offset; the offset can be modified with KVM_SET_CLOCK.  KVM will try
> > +to make all VCPUs follow this clock, but the exact value read by each
> > +VCPU could differ, because the host TSC is not stable.
> > +
> > +KVM_CLOCK_REALTIME.  If set, the `realtime` field in the kvm_clock_data
> > +structure is populated with the value of the host's real time
> > +clocksource at the instant when KVM_GET_CLOCK was called. If clear,
> > +the `realtime` field does not contain a value.
> > +
> > +KVM_CLOCK_HOST_TSC.  If set, the `host_tsc` field in the kvm_clock_data
> > +structure is populated with the value of the host's timestamp counter (TSC)
> > +at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field
> > +does not contain a value.
> >  
> >  ::
> >  
> >    struct kvm_clock_data {
> >  	__u64 clock;  /* kvmclock current value */
> >  	__u32 flags;
> > -	__u32 pad[9];
> > +	__u32 pad0;
> > +	__u64 realtime;
> > +	__u64 host_tsc;
> > +	__u32 pad[4];
> >    };
> >  
> >  
> > @@ -1023,12 +1037,22 @@ Sets the current timestamp of kvmclock to the value specified in its parameter.
> >  In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
> >  such as migration.
> >  
> > +FLAGS:
> > +
> > +KVM_CLOCK_REALTIME.  If set, KVM will compare the value of the `realtime` field
> > +with the value of the host's real time clocksource at the instant when
> > +KVM_SET_CLOCK was called. The difference in elapsed time is added to the final
> > +kvmclock value that will be provided to guests.
> > +
> >  ::
> >  
> >    struct kvm_clock_data {
> >  	__u64 clock;  /* kvmclock current value */
> >  	__u32 flags;
> > -	__u32 pad[9];
> > +	__u32 pad0;
> > +	__u64 realtime;
> > +	__u64 host_tsc;
> > +	__u32 pad[4];
> >    };
> >  
> >  
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index be6805fc0260..9c34b5b63e39 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1936,4 +1936,7 @@ int kvm_cpu_dirty_log_size(void);
> >  
> >  int alloc_all_memslots_rmaps(struct kvm *kvm);
> >  
> > +#define KVM_CLOCK_VALID_FLAGS						\
> > +	(KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
> > +
> >  #endif /* _ASM_X86_KVM_HOST_H */
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 523c4e5c109f..cb5d5cad5124 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2815,10 +2815,20 @@ static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data)
> >  	get_cpu();
> >  
> >  	if (__this_cpu_read(cpu_tsc_khz)) {
> > +#ifdef CONFIG_X86_64
> > +		struct timespec64 ts;
> > +
> > +		if (kvm_get_walltime_and_clockread(&ts, &data->host_tsc)) {
> > +			data->realtime = ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec;
> > +			data->flags |= KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC;
> > +		} else
> > +#endif
> > +		data->host_tsc = rdtsc();
> > +
> >  		kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL,
> >  				   &hv_clock.tsc_shift,
> >  				   &hv_clock.tsc_to_system_mul);
> > -		data->clock = __pvclock_read_cycles(&hv_clock, rdtsc());
> > +		data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc);
> >  	} else {
> >  		data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset;
> >  	}
> > @@ -4062,7 +4072,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >  		r = KVM_SYNC_X86_VALID_FIELDS;
> >  		break;
> >  	case KVM_CAP_ADJUST_CLOCK:
> > -		r = KVM_CLOCK_TSC_STABLE;
> > +		r = KVM_CLOCK_VALID_FLAGS;
> >  		break;
> >  	case KVM_CAP_X86_DISABLE_EXITS:
> >  		r |=  KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE |
> > @@ -5859,12 +5869,12 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
> >  {
> >  	struct kvm_arch *ka = &kvm->arch;
> >  	struct kvm_clock_data data;
> > -	u64 now_ns;
> > +	u64 now_raw_ns;
> >  
> >  	if (copy_from_user(&data, argp, sizeof(data)))
> >  		return -EFAULT;
> >  
> > -	if (data.flags)
> > +	if (data.flags & ~KVM_CLOCK_REALTIME)
> >  		return -EINVAL;
> >  
> >  	kvm_hv_invalidate_tsc_page(kvm);
> > @@ -5878,11 +5888,21 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
> >  	 * is slightly ahead) here we risk going negative on unsigned
> >  	 * 'system_time' when 'data.clock' is very small.
> >  	 */
> > -	if (kvm->arch.use_master_clock)
> > -		now_ns = ka->master_kernel_ns;
> > +	if (data.flags & KVM_CLOCK_REALTIME) {
> > +		u64 now_real_ns = ktime_get_real_ns();
> > +
> > +		/*
> > +		 * Avoid stepping the kvmclock backwards.
> > +		 */
> > +		if (now_real_ns > data.realtime)
> > +			data.clock += now_real_ns - data.realtime;
> > +	}
> 
> Forward jumps can also cause problems, for example:
> 
> * Kernel watchdogs
> 
> * https://patchwork.ozlabs.org/project/qemu-devel/patch/20130618233825.GA19042@amt.cnet/
> 
> So perhaps limiting the amount of forward jump that is allowed 
> would be a good thing? (which can happen if the two hosts realtime
> clocks are off).
> 
> Now by how much, i am not sure.
> 
> Or, as mentioned earlier, only enable KVM_CLOCK_REALTIME if userspace
> KVM code checks clock synchronization.
> 
> Thomas, CC'ed, has deeper understanding of problems with 
> forward time jumps than I do. Thomas, any comments?

Thomas,

Based on the earlier discussion about the problems of synchronizing
the guests clock via a notification to the NTP/Chrony daemon 
(where there is a window where applications can read the stale
value of the clock), a possible solution would be triggering
an NMI on the destination (so that it runs ASAP, with higher
priority than application/kernel).

What would this NMI do, exactly?

> As a note: this makes it not OK to use KVM_CLOCK_REALTIME flag 
> for either vm pause / vm resume (well, if paused for long periods of time) 
> or savevm / restorevm.

Maybe with the NMI above, it would be possible to use
the realtime clock as a way to know time elapsed between
events and advance guest clock without the current 
problematic window.




More information about the linux-arm-kernel mailing list