[PATCH] arm64: kvm: Expose timer offset directly via KVM_{GET,SET}_ONE_REG
David Woodhouse
dwmw2 at infradead.org
Thu Feb 2 07:18:55 PST 2023
On Thu, 2023-02-02 at 13:50 +0000, Marc Zyngier wrote:
> Hi Simon,
>
> On Thu, 02 Feb 2023 12:13:14 +0000,
> Simon Veith <sveith at amazon.de> wrote:
> >
> > The virtual timer count register (CNTVCT_EL0) is virtualized by
> > configuring offset register CNTVOFF_EL2 to subtract from the underlying
> > raw hardware timer count when the guest reads the current count.
> >
> > Currently, we offer userspace the ability to serialize and deserialize
> > only the absolute count register value, using KVM_{GET,SET}_ONE_REG with
> > KVM_REG_ARM_TIMER_CNT. Internally, we then compute and set the offset
> > register accordingly to obtain the requested count value.
> >
> > Allowing to set this timer count register only by absolute value poses
> > some problems to virtual machine monitors that try to maintain the
> > illusion of continuously ticking clocks to the guest: In workflows like
> > live migration or liveupdate, the timers must be increased artificially
> > to account for pause time.
>
> "must" is a pretty strong word. Given that this isn't advertised as
> stolen time to the guest, any sort of time-sensitive process (such as
> an in-guest watchdog) is likely to be ticked the wrong way if you
> start adding that time to the counter.
I'd have said that it *is* stolen time. Whether your hypervisor/VMM is
off in the weeds not running your CPU because it's in swap death, or
because it's live updating itself to a new version, it isn't running
the guest vCPUs and that time is stolen.
(Whether we actually account it to the guest as such is a quality of
implementation issue. As it is, doesn't the steal_time we report to a
guest go backwards to zero on migration with no way for the VMM to
restore it? That seems weird...)
And if the delta is so long that watchdogs tick (and if those watchdogs
don't explicitly check for something like x86's PVCLOCK_GUEST_STOPPED),
then those watchdogs are basically working as designed, surely?
> For example, QEMU doesn't do that, and wants time continuity, hence
> the current behaviour.
That's as may be, but VMMs other than QEMU can and do add a delta to
the timer before 'restoring' it, based on a delta calculated from the
wall clock. We're just asking to give those VMMs a more accurate way of
doing it, so the guest timer's relationship to both the host timer and
to actual wall clock / NTP time is *precisely* as it was before without
any drift.
I may yet fix QEMU; I've been working on the Xen PV timer support
across migration, and it's all a bit weird how an immediate stop/start
has it whining about clock skew and TSC instability; I can't tell
what's my bug and what was already broken.
> > Any delays between userspace computing the correct timer count value and
> > actually setting it in kernel space by KVM_SET_ONE_REG (such as can be
> > incurred by scheduling) become visible as under-accounted pause time in
> > the guest, meaning the guest observes that its system clock seems to
> > have fallen behind its NTP time reference.
> >
> > The issue is further complicated when vCPU setup is performed by
> > independent threads which may experience different delays, leading to
> > jitter between the clocks of different vCPUs.
>
> How? I really hope that you will have restored *all* the vcpus before
> restarting any. If you don't, your userspace is buggy.
But the timer counts from the epoch of when the KVM itself was
initialised, doesn't it? I haven't looked hard at the arm side but on
x86 if the various vCPU threads all use the "TSC is <x> now" API, they
only end up in sync because there's a hack for "if it's within one
second of the previously-set vCPU's TSC, make it precisely match". And
then they're only in sync with each *other* rather than what they were
before the live update.
Actually *running* the vCPUs comes later, of course.
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5965 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20230202/2083ff27/attachment.p7s>
More information about the linux-arm-kernel
mailing list