[PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK

Fri Oct 1 07:17:02 PDT 2021

On 01/10/21 01:02, Thomas Gleixner wrote:
> 
>   Now the proposed change is creating exactly the same problem:
> 
>   +	if (data.flags & KVM_CLOCK_REALTIME) {
>   +		u64 now_real_ns = ktime_get_real_ns();
>   +
>   +		/*
>   +		 * Avoid stepping the kvmclock backwards.
>   +		 */
>   +		if (now_real_ns > data.realtime)
>   +			data.clock += now_real_ns - data.realtime;
>   +	}

Indeed, though it's opt-in (you can always not pass KVM_CLOCK_REALTIME 
and then the kernel will not muck with the value you gave it).

> virt came along and created a hard to solve circular dependency
> problem:
> 
>    - If CLOCK_MONOTONIC stops for too long then NTP/PTP gets out of
>      sync, but everything else is happy.
>      
>    - If CLOCK_MONOTONIC jumps too far forward, then all hell breaks
>      lose, but NTP/PTP is happy.

Yes, I agree that this sums it up.

For example QEMU (meaning: Marcelo :)) has gone for the former and 
"hoping" that NTP/PTP sorts it out sooner or later.  The clock in 
nanoseconds is sent out to the destination and restored.

Google's userspace instead went for the latter.  The reason is that 
they've always started running on the destination before finishing the 
memory copy[1], therefore it's much easier to bound the CLOCK_MONOTONIC 
jump.

I do like very much the cooperative S2IDLE or even S3 way to handle the 
brownout during live migration.  However if your stopping time is 
bounded, these patches are nice because, on current processors that have 
TSC scaling, they make it possible to keep the illusion of the TSC 
running.  Of course that's a big "if"; however, you can always bound the 
stopping time by aborting the restart on the destination machine once 
you get close enough to the limit.

Paolo

[1] see https://dl.acm.org/doi/pdf/10.1145/3296975.3186415, figure 3