[RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support

Mon Feb 16 04:16:54 PST 2015

Hi Christoffer,

On Sun, Feb 15, 2015 at 9:03 PM, Christoffer Dall
<christoffer.dall at linaro.org> wrote:
> Hi Anup,
>
> On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote:
>> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
>> <christoffer.dall at linaro.org> wrote:
>> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
>> >> (dropping previous conversation for easy reading)
>> >>
>> >> Hi Marc/Christoffer,
>> >>
>> >> I tried implementing PMU context-switch via C code
>> >> in EL1 mode and in atomic context with irqs disabled.
>> >> The context switch itself works perfectly fine but
>> >> irq forwarding is not clean for PMU irq.
>> >>
>> >> I found another issue that is GIC only samples irq
>> >> lines if they are enabled. This means for using
>> >> irq forwarding we will need to ensure that host PMU
>> >> irq is enabled.  The arch_timer code does this by
>> >> doing request_irq() for host virtual timer interrupt.
>> >> For PMU, we can either enable/disable host PMU
>> >> irq in context switch or we need to do have shared
>> >> irq handler between kvm pmu and host kernel pmu.
>> >
>> > could we simply require the host PMU driver to request the IRQ and have
>> > the driver inject the corresponding IRQ to the VM via a mechanism
>> > similar to VFIO using an eventfd and irqfds etc.?
>>
>> Currently, the host PMU driver does request_irq() only when
>> there is some event to be monitored. This means host will do
>> request_irq() only when we run perf application on host
>> user space.
>>
>> Initially, I though that we could simply pass IRQF_SHARED
>> for request_irq() in host PMU driver and do the same for
>> reqest_irq() in KVM PMU code but the PMU irq can be
>> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
>> flag would fine but if its PPI then we have no way to
>> set IRQF_SHARED flag because request_percpu_irq()
>> does not have irq flags parameter.
>>
>> >
>> > (I haven't quite thought through if there's a way for the host PMU
>> > driver to distinguish between an IRQ for itself and one for the guest,
>> > though).
>> >
>> > It does feel like we will need some sort of communication/coordination
>> > between the host PMU driver and KVM...
>> >
>> >>
>> >> I have rethinked about our discussion so far. I
>> >> understand that we need KVM PMU virtualization
>> >> to meet following criteria:
>> >> 1. No modification in host PMU driver
>> >
>> > is this really a strict requirement?  one of the advantages of KVM
>> > should be that the rest of the kernel should be supportive of KVM.
>>
>> I guess so because host PMU driver should not do things
>> differently for host and guest. I think this the reason why
>> we discarded the mask/unmask PMU irq approach which
>> I had implemented in RFC v1.
>>
>> >
>> >> 2. No modification in guest PMU driver
>> >> 3. No mask/unmask dance for sharing host PMU irq
>> >> 4. Clean way to avoid infinite VM exits due to
>> >> PMU interrupt
>> >>
>> >> I have discovered new approach which is as follows:
>> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
>> >> 2. Ensure that host PMU irq is disabled when entering guest
>> >> mode and re-enable host PMU irq when exiting guest mode if
>> >> it was enabled previously.
>> >
>> > How does this look like software-engineering wise?  Would you be looking
>> > up the IRQ number from the DT in the KVM code again?  How does KVM then
>> > synchronize with the host PMU driver so they're not both requesting the
>> > same IRQ at the same time?
>>
>> We only lookup host PMU irq numbers from DT at HYP init time.
>>
>> During context switch we know the host PMU irq number for
>> current host CPU so we can get state of host PMU irq in
>> context switch code.
>>
>> If we go by the shard irq handler approach then both KVM
>> and host PMU driver will do request_irq() on same host
>> PMU irq. In other words, there is no virtual PMU irq provided
>> by HW for guest.
>>
>
> Sorry for the *really* long delay in this response.
>
> We had a chat about this subject with Will Deacon and Marc Zyngier
> during connect, and basically we came to think of a number of problems
> with the current approach:
>
> 1. As you pointed out, there is a need for a shared IRQ handler, and
>    there is no immediately nice way to implement this without a more
>    sophisticated perf/kvm interface, probably comprising eventfds or
>    something similar.
>
> 2. Hijacking the counters for the VM without perf knowing about it
>    basically makes it impossible to do system-wide event counting, an
>    important use case for a virtualization host.
>
> So the approach we will be taking now would be to:
>
> First, implement a strictly trap-and-emulate in software approach.  This
> would allow any software relying on access to performance counters to
> work, although potentially with slightly unprecise values.  This is the
> approach taken by x86 and would be significantly simpler to support on
> systems like big.LITTLE as well.

Actually, trap-and-emulate would also help avoid additions to the
KVM world switch.

>
> Second, if there are values obtained from within the guest that are so
> skewed by the trap-and-emulate approach that we need to give the guest
> access to counters, we should try to share the hardware by partitioning
> the physical counters, but again, we need to coordinate with the host
> perf system for this.  We would only be pursuing this approach if
> absolutely necessary.

Yes, with trap-and-emulate we cannot accurately emulate all types
of hw counters (particularly cache misses and similar events).

>
> Apologies for the change in direction on this.
>
> What are your thoughts?  Do you still have time/interest to work
> on any of this?

Its a drastic change in direction.

Currently, I have taken up some different work (not related to KVM)
so for next few months I wont be able spend time on this.

Its better if Linaro takes this work to avoid any further delays.

Best Regards,
Anup

>
> Thanks,
> -Christoffer