[RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs

Fri Nov 19 05:31:11 PST 2021

On Fri, 19 Nov 2021 12:59:46 +0000,
Marcelo Tosatti <mtosatti at redhat.com> wrote:
> 
> On Fri, Nov 19, 2021 at 12:17:00PM +0000, Marc Zyngier wrote:
> > On Fri, 19 Nov 2021 10:21:18 +0000,
> > Nicolas Saenz Julienne <nsaenzju at redhat.com> wrote:
> > > 
> > > While using cntvct as the raw clock for tracing, it's possible to
> > > synchronize host/guest traces just by knowing the virtual offset applied
> > > to the guest's virtual counter.
> > > 
> > > This is also the case on x86 when TSC is available. The offset is
> > > exposed in debugfs as 'tsc-offset' on a per vcpu basis. So let's
> > > implement the same for arm64.
> > 
> > How does this work with NV, where the guest hypervisor is in control
> > of the virtual offset? How does userspace knows which vcpu to pick so
> > that it gets the right offset?
> 
> On x86, the offsets for different vcpus are the same due to the logic at
> kvm_synchronize_tsc function:
> 
> During guest vcpu creation, when the TSC-clock values are written
> in a short window of time (or the clock value is zero), the code uses
> the same TSC.
> 
> This logic is problematic (since "short window of time" is a heuristic 
> which can fail), and is being replaced by writing the same offset
> for each vCPU:
> 
> commit 828ca89628bfcb1b8f27535025f69dd00eb55207
> Author: Oliver Upton <oupton at google.com>
> Date:   Thu Sep 16 18:15:38 2021 +0000
> 
>     KVM: x86: Expose TSC offset controls to userspace
>     
>     To date, VMM-directed TSC synchronization and migration has been a bit
>     messy. KVM has some baked-in heuristics around TSC writes to infer if
>     the VMM is attempting to synchronize. This is problematic, as it depends
>     on host userspace writing to the guest's TSC within 1 second of the last
>     write.
>     
>     A much cleaner approach to configuring the guest's views of the TSC is to
>     simply migrate the TSC offset for every vCPU. Offsets are idempotent,
>     and thus not subject to change depending on when the VMM actually
>     reads/writes values from/to KVM. The VMM can then read the TSC once with
>     KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
>     the guest is paused.
> 
> So with that in place, the answer to 
> 
> How does userspace knows which vcpu to pick so
> that it gets the right offset?
> 
> is any vcpu, since the offsets are the same.

As I just said above, this assertion doesn't hold true once you have
nested virt, because the offset is per-cpu, and is adjusted to mean
different things on different hypervisors (some hypervisors expose
stolen time through it, for example).

What this patch is doing is to expose a Linux-specific behaviour, and
try to derive properties from it. It really doesn't work in general.

> 
> > I also wonder why we need this when userspace already has direct
> > access to that information without any extra kernel support (read the
> > CNTVCT view of the vcpu using the ONEREG API, subtract it from the
> > host view of the counter, job done).
> 
> If guest has access to the clock offset (between guest and host), then
> in the guest:
> 
> 	clockval = hostclockval - clockoffset
> 
> Adding "clockoffset" to that will retrieve the host clock.
> 
> Is that what you mean?

No. The *VMM* (qemu, kvmtool, crosvm, insertyourfavouriteonehere) has
already access to it. Why do we need an extra interface?

	M.

-- 
Without deviation from the norm, progress is not possible.