[PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK'

David Woodhouse dwmw2 at infradead.org
Thu Jul 25 01:24:19 PDT 2024


On Wed, 2024-07-24 at 15:24 -0700, Sean Christopherson wrote:
> /cast <Raise Skeleton>
> 
> On Wed, Jan 17, 2024, David Woodhouse wrote:
> > On Thu, 2021-09-16 at 18:15 +0000, Oliver Upton wrote:
> > > 
> > > @@ -5878,11 +5888,21 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
> > >          * is slightly ahead) here we risk going negative on unsigned
> > >          * 'system_time' when 'data.clock' is very small.
> > >          */
> > > -       if (kvm->arch.use_master_clock)
> > > -               now_ns = ka->master_kernel_ns;
> > > +       if (data.flags & KVM_CLOCK_REALTIME) {
> > > +               u64 now_real_ns = ktime_get_real_ns();
> > > +
> > > +               /*
> > > +                * Avoid stepping the kvmclock backwards.
> > > +                */
> > > +               if (now_real_ns > data.realtime)
> > > +                       data.clock += now_real_ns - data.realtime;
> > > +       }
> > > +
> > > +       if (ka->use_master_clock)
> > > +               now_raw_ns = ka->master_kernel_ns;
> > 
> > This looks wrong to me.
> > 
> > >         else
> > > -               now_ns = get_kvmclock_base_ns();
> > > -       ka->kvmclock_offset = data.clock - now_ns;
> > > +               now_raw_ns = get_kvmclock_base_ns();
> > > +       ka->kvmclock_offset = data.clock - now_raw_ns;
> > >         kvm_end_pvclock_update(kvm);
> > >         return 0;
> > >  }
> > 
> > We use the host CLOCK_MONOTONIC_RAW plus the boot offset, as a
> > 'kvmclock base clock', and get_kvmclock_base_ns() returns that. The KVM
> > clocks for each VMs are based on this 'kvmclock base clock', each
> > offset by a ka->kvmclock_offset which represents the time at which that
> > VM was started — so each VM's clock starts from zero.
> > 
> > The values of ka->master_kernel_ns and ka->master_cycle_now represent a
> > single point in time, the former being the value of
> > get_kvmclock_base_ns() at that moment and the latter being the host TSC
> > value. In pvclock_update_vm_gtod_copy(), kvm_get_time_and_clockread()
> > is used to return both values at precisely the same moment, from the
> > *same* rdtsc().
> > 
> > This allows the current 'kvmclock base clock' to be calculated at any
> > moment by reading the TSC, calculating a delta to that reading from
> > ka->master_cycle_now to determine how much time has elapsed since
> > ka->master_kernel_ns. We can then add ka->kvmclock_offset to get the
> > kvmclock for this particular VM.
> > 
> > Now, looking at the code quoted above. It's given a kvm_clock_data
> > struct which contains a value of the KVM clock which is to be set as
> > the time "now", and all it does is adjust ka->kvmclock_offset
> > accordingly. Which is really simple:
> > 
> >                 now_raw_ns = get_kvmclock_base_ns();
> >         ka->kvmclock_offset = data.clock - now_raw_ns;
> > 
> > Et voilà, now get_kvmclock_base_ns() + ka->kvmclock_offset at any given
> > moment in time will result in a kvmclock value according to what was
> > just set. Yay!
> > 
> > Except... in the case where the TSC is constant, we actually set
> > 'now_raw_ns' to a value that doesn't represent *now*. Instead, we set
> > it to ka->master_kernel_ns which represents some point in the *past*.
> > We should add the number of TSC ticks since ka->master_cycle_now if
> > we're going to use that, surely?
> 
> Somewhat ironically, without the KVM_CLOCK_REALTIME goo, there's no need to
> re-read TSC, because the rdtsc() in pvclock_update_vm_gtod_copy() *just* happened.
> But the call to ktime_get_real_ns() could theoretically spin for a non-trivial
> amount of time if the clock is being refreshed.

Aha, thank you. That makes sense. I note that in similar code in
https://lore.kernel.org/lkml/20240522001817.619072-4-dwmw2@infradead.org/
Jack explicitly added a comment to precisely that effect:

+	 * The call to pvclock_update_vm_gtod_copy() has created a new time
+	 * reference point in ka->master_cycle_now and ka->master_kernel_ns.

Code comments are wonderful things. Someone must have trained him well :)

(I'll be casting Raise Skeleton on that series too as soon as I have
dispatched the existing horde, FWIW.)

So the remaining problem is that we do re-read the TSC for the
KVM_CLOCK_REALTIME case, as you pointed out. When we should be
calculating the relationship between real time and the KVM clock at
precisely the *same* moment.

But I don't care about that, because this whole API is suboptimal
anyway; we should just set the KVM clock in terms of the guest TSC.
Which is what Jack's patch allows.



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5965 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20240725/71593ff3/attachment.p7s>


More information about the linux-arm-kernel mailing list