[PATCH 0/7] KVM: arm64: add more event counters for kvm_stat

Tue Mar 23 18:04:08 GMT 2021

Hi Yoan,

On Fri, 19 Mar 2021 16:17:04 +0000,
Yoan Picchi <yoan.picchi at arm.com> wrote:
> 
> Hi all,
> 
> As mentioned in the KVM forum talk from 2019
> (https://kvmforum2019.sched.com/event/Tmwf/kvmstat-and-beyond-past-present-and-future-of-performance-monitoring-christian-borntrager-ibm
> page 10), there is few event counters for kvm_stat in the arm64
> version of kvm when you compare it to something like the x86
> version.

Crucially, these differences exist because there is no universal
equivalence between architecture features. A TLB invalidation on x86
doesn't do the same thing as on PPC nor arm64.

> Those counters are used in kvm_stat by kernel/driver developers to
> have a rough idea of the impact of their patches on the general performance.
> An example would be to make sure a patch don't increase to much the amount
> of interruptions. Those patches aim to add more counters to make the use of
> kvm_stat more relevant when measuring performance impact.

Adding more counters only make sense if the semantic of these counters
is clearly defined. In this series, you have sprayed a bunch of x++ in
random places, without defining what you were trying to count.

> I am new in working on kernel-related things and I am learning kvm as I go.
> Some of the counter I added early (memory_slot_unmaped, stage2_unmap_vm)
> no longer seems relevant because while they do interesting things, they
> happens in very specific scenarios. Instead of just deleting them, I prefer
> to ask weither a little-used counter or no counter is the best.

It works the other way around: the onus is on you to explain *why* we
should even consider a counter or another.

May I suggest that you start by picking exactly *one* metric, work out
exactly what it is supposed to count, what significance it has at the
architectural level, what it actually brings to the user? Then try and
make a good job at implementing these semantics. You will learn about
the arm64 architecture and KVM in one swift go, one area at a time.

> I can also use some suggestion on how to test those counters as some like
> remote_tlb_flush which mostly happen when fixing up a race condition; or
> what uncovered event could be worth adding in a future patch set.

remote_tlb_flush is a great example of something that really *doesn't*
make much sense on KVM/arm64. We don't deal with remote TLBs
independently of the local ones outside of vcpu migration (which you
don't cover either).

I can only urge you to focus on the architectural meaning of the
metric you picked, and see how it maps across the hypervisor.

Finally, there is the question of the general availability of these
counters. They live in debugfs, which isn't a proper userspace
ABI. There is work going on around making this a more palatable
interface, and I'd rather see where this is going before expanding the
number of counters.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.