[PATCH 0/9] KVM: arm64: PMU: Fixing chained events, and PMUv3p5 support

Ricardo Koller ricarkol at google.com
Fri Aug 12 15:53:44 PDT 2022


On Thu, Aug 11, 2022 at 01:56:21PM +0100, Marc Zyngier wrote:
> On Wed, 10 Aug 2022 22:55:03 +0100,
> Ricardo Koller <ricarkol at google.com> wrote:
> > 
> > On Wed, Aug 10, 2022 at 02:33:53PM -0500, Oliver Upton wrote:
> > > Hi Ricardo,
> > > 
> > > On Wed, Aug 10, 2022 at 11:46:22AM -0700, Ricardo Koller wrote:
> > > > On Fri, Aug 05, 2022 at 02:58:04PM +0100, Marc Zyngier wrote:
> > > > > Ricardo recently reported[1] that our PMU emulation was busted when it
> > > > > comes to chained events, as we cannot expose the overflow on a 32bit
> > > > > boundary (which the architecture requires).
> > > > > 
> > > > > This series aims at fixing this (by deleting a lot of code), and as a
> > > > > bonus adds support for PMUv3p5, as this requires us to fix a few more
> > > > > things.
> > > > > 
> > > > > Tested on A53 (PMUv3) and FVP (PMUv3p5).
> > > > > 
> > > > > [1] https://lore.kernel.org/r/20220805004139.990531-1-ricarkol@google.com
> > > > > 
> > > > > Marc Zyngier (9):
> > > > >   KVM: arm64: PMU: Align chained counter implementation with
> > > > >     architecture pseudocode
> > > > >   KVM: arm64: PMU: Distinguish between 64bit counter and 64bit overflow
> > > > >   KVM: arm64: PMU: Only narrow counters that are not 64bit wide
> > > > >   KVM: arm64: PMU: Add counter_index_to_*reg() helpers
> > > > >   KVM: arm64: PMU: Simplify setting a counter to a specific value
> > > > >   KVM: arm64: PMU: Move the ID_AA64DFR0_EL1.PMUver limit to VM creation
> > > > >   KVM: arm64: PMU: Aleven ID_AA64DFR0_EL1.PMUver to be set from userspace
> > > > >   KVM: arm64: PMU: Implement PMUv3p5 long counter support
> > > > >   KVM: arm64: PMU: Aleven PMUv3p5 to be exposed to the guest
> > > > > 
> > > > >  arch/arm64/include/asm/kvm_host.h |   1 +
> > > > >  arch/arm64/kvm/arm.c              |   6 +
> > > > >  arch/arm64/kvm/pmu-emul.c         | 372 ++++++++++--------------------
> > > > >  arch/arm64/kvm/sys_regs.c         |  65 +++++-
> > > > >  include/kvm/arm_pmu.h             |  16 +-
> > > > >  5 files changed, 208 insertions(+), 252 deletions(-)
> > > > > 
> > > > > -- 
> > > > > 2.34.1
> > > > > 
> > > > 
> > > > Hi Marc,
> > > > 
> > > > There is one extra potential issue with exposing PMUv3p5. I see this
> > > > weird behavior when doing passthrough ("bare metal") on the fast-model
> > > > configured to emulate PMUv3p5: the [63:32] half of the counters
> > > > overflowing at 32-bits is still incremented.
> > > > 
> > > >   Fast model - ARMv8.5:
> > > >    
> > > > 	Assuming the initial state is even=0xFFFFFFFF and odd=0x0,
> > > > 	incrementing the even counter leads to:
> > > > 
> > > > 	0x00000001_00000000	0x00000000_00000001		0x1
> > > > 	even counter		odd counter			PMOVSET
> > > > 
> > > > 	Assuming the initial state is even=0xFFFFFFFF and odd=0xFFFFFFFF,
> > > > 	incrementing the even counter leads to:
> > > > 
> > > > 	0x00000001_00000000	0x00000001_00000000		0x3
> > > > 	even counter		odd counter			PMOVSET
> > > 
> > > This is to be expected, actually. PMUv8p5 counters are always 64 bit,
> > > regardless of the configured overflow.
> > > 
> > > DDI 0487H D8.3 Behavior on overflow
> > > 
> > >   If FEAT_PMUv3p5 is implemented, 64-bit event counters are implemented,
> > >   HDCR.HPMN is not 0, and either n is in the range [0 .. (HDCR.HPMN-1)]
> > >   or EL2 is not implemented, then event counter overflow is configured
> > >   by PMCR.LP:
> > > 
> > >   — When PMCR.LP is set to 0, if incrementing PMEVCNTR<n> causes an unsigned
> > >     overflow of bits [31:0] of the event counter, the PE sets PMOVSCLR[n] to 1.
> > >   — When PMCR.LP is set to 1, if incrementing PMEVCNTR<n> causes an unsigned
> > >     overflow of bits [63:0] of the event counter, the PE sets PMOVSCLR[n] to 1.
> > > 
> > >   [...]
> > > 
> > >   For all 64-bit counters, incrementing the counter is the same whether an
> > >   unsigned overflow occurs at [31:0] or [63:0]. If the counter increments
> > >   for an event, bits [63:0] are always incremented.
> > > 
> > > Do you see this same (expected) failure w/ Marc's series?
> > 
> > I don't know, I'm hitting another bug it seems.
> > 
> > Just realized that KVM does not offer PMUv3p5 (with this series applied)
> > when the real hardware is only Armv8.2 (the setup I originally tried).
> > So, tried these other two setups on the fast model:
> > 
> > has_arm_v8-5=1
> > 
> > 	# ./lkvm-static run --nodefaults --pmu pmu.flat -p pmu-chained-sw-incr
> > 	# lkvm run -k pmu.flat -m 704 -c 8 --name guest-135
> > 
> > 	INFO: PMU version: 0x6
> >                            ^^^
> >                            PMUv3 for Armv8.5
> > 	INFO: PMU implementer/ID code: 0x41("A")/0
> > 	INFO: Implements 8 event counters
> > 	FAIL: pmu: pmu-chained-sw-incr: overflow and chain counter incremented after 100 SW_INCR/CHAIN
> > 	INFO: pmu: pmu-chained-sw-incr: overflow=0x0, #0=4294967380 #1=0
> >                                                  ^^^
> >                                                  no overflows
> > 	FAIL: pmu: pmu-chained-sw-incr: expected overflows and values after 100 SW_INCR/CHAIN
> > 	INFO: pmu: pmu-chained-sw-incr: overflow=0x0, #0=84 #1=-1
> > 	INFO: pmu: pmu-chained-sw-incr: overflow=0x0, #0=4294967380 #1=4294967295
> > 	SUMMARY: 2 tests, 2 unexpected failures
> 
> Hmm. I think I see what's wrong. In kvm_pmu_create_perf_event(), we
> have this:
> 
> 	if (kvm_pmu_idx_is_64bit(vcpu, select_idx))
> 		attr.config1 |= 1;
> 
> 	counter = kvm_pmu_get_counter_value(vcpu, select_idx);
> 
> 	/* The initial sample period (overflow count) of an event. */
> 	if (kvm_pmu_idx_has_64bit_overflow(vcpu, select_idx))
> 		attr.sample_period = (-counter) & GENMASK(63, 0);
> 	else
> 		attr.sample_period = (-counter) & GENMASK(31, 0);
> 
> but the initial sampling period shouldn't be based on the *guest*
> counter overflow. It really is about the getting to an overflow on the
> *host*, so the initial code was correct, and only the width of the
> counter matters here.

Right, I think this requires bringing back some of the chained related
code (like update_pmc_chained() and pmc_is_chained()), because

	attr.sample_period = (-counter) & GENMASK(31, 0);

should also be used when the counter is chained.

Thanks,
Ricardo

> 
> /me goes back to running the FVP...
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list