[PATCH v8 0/7] KVM: x86: Add idempotent controls for migrating system counter state

Oliver Upton oupton at google.com
Thu Sep 16 11:15:31 PDT 2021


KVM's current means of saving/restoring system counters is plagued with
temporal issues. On x86, we migrate the guest's system counter by-value
through the respective guest's IA32_TSC value. Restoring system counters
by-value is brittle as the state is not idempotent: the host system
counter is still oscillating between the attempted save and restore.
Furthermore, VMMs may wish to transparently live migrate guest VMs,
meaning that they include the elapsed time due to live migration blackout
in the guest system counter view. The VMM thread could be preempted for
any number of reasons (scheduler, L0 hypervisor under nested) between the
time that it calculates the desired guest counter value and when
KVM actually sets this counter state.

Despite the value-based interface that we present to userspace, KVM
actually has idempotent guest controls by way of the TSC offset.
We can avoid all of the issues associated with a value-based interface
by abstracting these offset controls in a new device attribute. This
series introduces new vCPU device attributes to provide userspace access
to the vCPU's system counter offset.

Patches 1-2 are Paolo's refactorings around locking and the
KVM_{GET,SET}_CLOCK ioctls.

Patch 3 cures a race where use_master_clock is read outside of the
pvclock lock in the KVM_GET_CLOCK ioctl.

Patch 4 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK
ioctls to provide userspace with a (host_tsc, realtime) instant. This is
essential for a VMM to perform precise migration of the guest's system
counters.

Patch 5 does away with the pvclock spin lock in favor of a sequence
lock based on the tsc_write_lock. The original patch is from Paolo, I
touched it up a bit to fix a deadlock and some unused variables that
caused -Werror to scream.

Patch 6 extracts the TSC synchronization tracking code in a way that it
can be used for both offset-based and value-based TSC synchronization
schemes.

Finally, patch 7 implements a vCPU device attribute which allows VMMs to
get at the TSC offset of a vCPU.

This series was tested with the new KVM selftests for the KVM clock and
system counter offset controls on Haswell hardware. Kernel was built
with CONFIG_LOCKDEP given the new locking changes/lockdep assertions
here.

Note that these tests are mailed as a separate series due to the
dependencies in both x86 and arm64.

Applies cleanly to 5.15-rc1

v8: http://lore.kernel.org/r/20210816001130.3059564-1-oupton@google.com

v7 -> v8:
 - Rebased to 5.15-rc1
 - Picked up Paolo's version of the series, which includes locking
   changes
 - Make KVM advertise KVM_CAP_VCPU_ATTRIBUTES

Oliver Upton (4):
  KVM: x86: Fix potential race in KVM_GET_CLOCK
  KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
  KVM: x86: Refactor tsc synchronization code
  KVM: x86: Expose TSC offset controls to userspace

Paolo Bonzini (3):
  kvm: x86: abstract locking around pvclock_update_vm_gtod_copy
  KVM: x86: extract KVM_GET_CLOCK/KVM_SET_CLOCK to separate functions
  kvm: x86: protect masterclock with a seqcount

 Documentation/virt/kvm/api.rst          |  42 ++-
 Documentation/virt/kvm/devices/vcpu.rst |  57 +++
 arch/x86/include/asm/kvm_host.h         |  12 +-
 arch/x86/include/uapi/asm/kvm.h         |   4 +
 arch/x86/kvm/x86.c                      | 458 ++++++++++++++++--------
 include/uapi/linux/kvm.h                |   7 +-
 6 files changed, 419 insertions(+), 161 deletions(-)

-- 
2.33.0.309.g3052b89438-goog




More information about the linux-arm-kernel mailing list