[PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK

David Woodhouse dwmw2 at infradead.org
Wed Jan 17 06:28:51 PST 2024


On Thu, 2021-09-16 at 18:15 +0000, Oliver Upton wrote:
> 
> @@ -5878,11 +5888,21 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
>          * is slightly ahead) here we risk going negative on unsigned
>          * 'system_time' when 'data.clock' is very small.
>          */
> -       if (kvm->arch.use_master_clock)
> -               now_ns = ka->master_kernel_ns;
> +       if (data.flags & KVM_CLOCK_REALTIME) {
> +               u64 now_real_ns = ktime_get_real_ns();
> +
> +               /*
> +                * Avoid stepping the kvmclock backwards.
> +                */
> +               if (now_real_ns > data.realtime)
> +                       data.clock += now_real_ns - data.realtime;
> +       }
> +
> +       if (ka->use_master_clock)
> +               now_raw_ns = ka->master_kernel_ns;

This looks wrong to me.

>         else
> -               now_ns = get_kvmclock_base_ns();
> -       ka->kvmclock_offset = data.clock - now_ns;
> +               now_raw_ns = get_kvmclock_base_ns();
> +       ka->kvmclock_offset = data.clock - now_raw_ns;
>         kvm_end_pvclock_update(kvm);
>         return 0;
>  }

We use the host CLOCK_MONOTONIC_RAW plus the boot offset, as a
'kvmclock base clock', and get_kvmclock_base_ns() returns that. The KVM
clocks for each VMs are based on this 'kvmclock base clock', each
offset by a ka->kvmclock_offset which represents the time at which that
VM was started — so each VM's clock starts from zero.

The values of ka->master_kernel_ns and ka->master_cycle_now represent a
single point in time, the former being the value of
get_kvmclock_base_ns() at that moment and the latter being the host TSC
value. In pvclock_update_vm_gtod_copy(), kvm_get_time_and_clockread()
is used to return both values at precisely the same moment, from the
*same* rdtsc().

This allows the current 'kvmclock base clock' to be calculated at any
moment by reading the TSC, calculating a delta to that reading from
ka->master_cycle_now to determine how much time has elapsed since
ka->master_kernel_ns. We can then add ka->kvmclock_offset to get the
kvmclock for this particular VM.

Now, looking at the code quoted above. It's given a kvm_clock_data
struct which contains a value of the KVM clock which is to be set as
the time "now", and all it does is adjust ka->kvmclock_offset
accordingly. Which is really simple:

		now_raw_ns = get_kvmclock_base_ns();
	ka->kvmclock_offset = data.clock - now_raw_ns;

Et voilà, now get_kvmclock_base_ns() + ka->kvmclock_offset at any given
moment in time will result in a kvmclock value according to what was
just set. Yay!

Except... in the case where the TSC is constant, we actually set
'now_raw_ns' to a value that doesn't represent *now*. Instead, we set
it to ka->master_kernel_ns which represents some point in the *past*.
We should add the number of TSC ticks since ka->master_cycle_now if
we're going to use that, surely?


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5965 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20240117/7ee2db90/attachment.p7s>


More information about the linux-arm-kernel mailing list