[RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support

Sun Feb 15 07:33:37 PST 2015

Hi Anup,

On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote:
> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
> <christoffer.dall at linaro.org> wrote:
> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
> >> (dropping previous conversation for easy reading)
> >>
> >> Hi Marc/Christoffer,
> >>
> >> I tried implementing PMU context-switch via C code
> >> in EL1 mode and in atomic context with irqs disabled.
> >> The context switch itself works perfectly fine but
> >> irq forwarding is not clean for PMU irq.
> >>
> >> I found another issue that is GIC only samples irq
> >> lines if they are enabled. This means for using
> >> irq forwarding we will need to ensure that host PMU
> >> irq is enabled.  The arch_timer code does this by
> >> doing request_irq() for host virtual timer interrupt.
> >> For PMU, we can either enable/disable host PMU
> >> irq in context switch or we need to do have shared
> >> irq handler between kvm pmu and host kernel pmu.
> >
> > could we simply require the host PMU driver to request the IRQ and have
> > the driver inject the corresponding IRQ to the VM via a mechanism
> > similar to VFIO using an eventfd and irqfds etc.?
> 
> Currently, the host PMU driver does request_irq() only when
> there is some event to be monitored. This means host will do
> request_irq() only when we run perf application on host
> user space.
> 
> Initially, I though that we could simply pass IRQF_SHARED
> for request_irq() in host PMU driver and do the same for
> reqest_irq() in KVM PMU code but the PMU irq can be
> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
> flag would fine but if its PPI then we have no way to
> set IRQF_SHARED flag because request_percpu_irq()
> does not have irq flags parameter.
> 
> >
> > (I haven't quite thought through if there's a way for the host PMU
> > driver to distinguish between an IRQ for itself and one for the guest,
> > though).
> >
> > It does feel like we will need some sort of communication/coordination
> > between the host PMU driver and KVM...
> >
> >>
> >> I have rethinked about our discussion so far. I
> >> understand that we need KVM PMU virtualization
> >> to meet following criteria:
> >> 1. No modification in host PMU driver
> >
> > is this really a strict requirement?  one of the advantages of KVM
> > should be that the rest of the kernel should be supportive of KVM.
> 
> I guess so because host PMU driver should not do things
> differently for host and guest. I think this the reason why
> we discarded the mask/unmask PMU irq approach which
> I had implemented in RFC v1.
> 
> >
> >> 2. No modification in guest PMU driver
> >> 3. No mask/unmask dance for sharing host PMU irq
> >> 4. Clean way to avoid infinite VM exits due to
> >> PMU interrupt
> >>
> >> I have discovered new approach which is as follows:
> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> >> 2. Ensure that host PMU irq is disabled when entering guest
> >> mode and re-enable host PMU irq when exiting guest mode if
> >> it was enabled previously.
> >
> > How does this look like software-engineering wise?  Would you be looking
> > up the IRQ number from the DT in the KVM code again?  How does KVM then
> > synchronize with the host PMU driver so they're not both requesting the
> > same IRQ at the same time?
> 
> We only lookup host PMU irq numbers from DT at HYP init time.
> 
> During context switch we know the host PMU irq number for
> current host CPU so we can get state of host PMU irq in
> context switch code.
> 
> If we go by the shard irq handler approach then both KVM
> and host PMU driver will do request_irq() on same host
> PMU irq. In other words, there is no virtual PMU irq provided
> by HW for guest.
> 

Sorry for the *really* long delay in this response.

We had a chat about this subject with Will Deacon and Marc Zyngier
during connect, and basically we came to think of a number of problems
with the current approach:

1. As you pointed out, there is a need for a shared IRQ handler, and
   there is no immediately nice way to implement this without a more
   sophisticated perf/kvm interface, probably comprising eventfds or
   something similar.

2. Hijacking the counters for the VM without perf knowing about it
   basically makes it impossible to do system-wide event counting, an
   important use case for a virtualization host.

So the approach we will be taking now would be to:

First, implement a strictly trap-and-emulate in software approach.  This
would allow any software relying on access to performance counters to
work, although potentially with slightly unprecise values.  This is the
approach taken by x86 and would be significantly simpler to support on
systems like big.LITTLE as well.

Second, if there are values obtained from within the guest that are so
skewed by the trap-and-emulate approach that we need to give the guest
access to counters, we should try to share the hardware by partitioning
the physical counters, but again, we need to coordinate with the host
perf system for this.  We would only be pursuing this approach if
absolutely necessary.

Apologies for the change in direction on this.

What are your thoughts?  Do you still have time/interest to work
on any of this?

Thanks,
-Christoffer