[PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK'
Sean Christopherson
seanjc at google.com
Wed Jul 24 15:24:39 PDT 2024
/cast <Raise Skeleton>
On Wed, Jan 17, 2024, David Woodhouse wrote:
> On Thu, 2021-09-16 at 18:15 +0000, Oliver Upton wrote:
> >
> > @@ -5878,11 +5888,21 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
> > * is slightly ahead) here we risk going negative on unsigned
> > * 'system_time' when 'data.clock' is very small.
> > */
> > - if (kvm->arch.use_master_clock)
> > - now_ns = ka->master_kernel_ns;
> > + if (data.flags & KVM_CLOCK_REALTIME) {
> > + u64 now_real_ns = ktime_get_real_ns();
> > +
> > + /*
> > + * Avoid stepping the kvmclock backwards.
> > + */
> > + if (now_real_ns > data.realtime)
> > + data.clock += now_real_ns - data.realtime;
> > + }
> > +
> > + if (ka->use_master_clock)
> > + now_raw_ns = ka->master_kernel_ns;
>
> This looks wrong to me.
>
> > else
> > - now_ns = get_kvmclock_base_ns();
> > - ka->kvmclock_offset = data.clock - now_ns;
> > + now_raw_ns = get_kvmclock_base_ns();
> > + ka->kvmclock_offset = data.clock - now_raw_ns;
> > kvm_end_pvclock_update(kvm);
> > return 0;
> > }
>
> We use the host CLOCK_MONOTONIC_RAW plus the boot offset, as a
> 'kvmclock base clock', and get_kvmclock_base_ns() returns that. The KVM
> clocks for each VMs are based on this 'kvmclock base clock', each
> offset by a ka->kvmclock_offset which represents the time at which that
> VM was started — so each VM's clock starts from zero.
>
> The values of ka->master_kernel_ns and ka->master_cycle_now represent a
> single point in time, the former being the value of
> get_kvmclock_base_ns() at that moment and the latter being the host TSC
> value. In pvclock_update_vm_gtod_copy(), kvm_get_time_and_clockread()
> is used to return both values at precisely the same moment, from the
> *same* rdtsc().
>
> This allows the current 'kvmclock base clock' to be calculated at any
> moment by reading the TSC, calculating a delta to that reading from
> ka->master_cycle_now to determine how much time has elapsed since
> ka->master_kernel_ns. We can then add ka->kvmclock_offset to get the
> kvmclock for this particular VM.
>
> Now, looking at the code quoted above. It's given a kvm_clock_data
> struct which contains a value of the KVM clock which is to be set as
> the time "now", and all it does is adjust ka->kvmclock_offset
> accordingly. Which is really simple:
>
> now_raw_ns = get_kvmclock_base_ns();
> ka->kvmclock_offset = data.clock - now_raw_ns;
>
> Et voilà, now get_kvmclock_base_ns() + ka->kvmclock_offset at any given
> moment in time will result in a kvmclock value according to what was
> just set. Yay!
>
> Except... in the case where the TSC is constant, we actually set
> 'now_raw_ns' to a value that doesn't represent *now*. Instead, we set
> it to ka->master_kernel_ns which represents some point in the *past*.
> We should add the number of TSC ticks since ka->master_cycle_now if
> we're going to use that, surely?
Somewhat ironically, without the KVM_CLOCK_REALTIME goo, there's no need to
re-read TSC, because the rdtsc() in pvclock_update_vm_gtod_copy() *just* happened.
But the call to ktime_get_real_ns() could theoretically spin for a non-trivial
amount of time if the clock is being refreshed.
More information about the linux-arm-kernel
mailing list