[PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
Christoffer Dall
christoffer.dall at linaro.org
Wed Sep 9 01:49:48 PDT 2015
On Tue, Sep 8, 2015 at 6:57 PM, Andre Przywara <andre.przywara at arm.com> wrote:
> Hi Eric,
>
> thanks for you answer.
>
> On 08/09/15 09:43, Eric Auger wrote:
>> Hi Andre,
>> On 09/07/2015 01:25 PM, Andre Przywara wrote:
>>> Hi,
>>>
>>> firstly: this text is really great, thanks for coming up with that.
>>> See below for some information I got from tracing the host which I
>>> cannot make sense of....
>>>
>>>
>>> On 04/09/15 20:40, Christoffer Dall wrote:
>>>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>>>> way we deal with them is not apparently easy to understand by reading
>>>> various specs.
>>>>
>>>> Therefore, add a proper documentation file explaining the flow and
>>>> rationale of the behavior of the vgic.
>>>>
>>>> Some of this text was contributed by Marc Zyngier and edited by me.
>>>> Omissions and errors are all mine.
>>>>
>>>> Signed-off-by: Christoffer Dall <christoffer.dall at linaro.org>
>>>> ---
>>>> Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>>>> 1 file changed, 181 insertions(+)
>>>> create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>
>>>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>> new file mode 100644
>>>> index 0000000..24b6f28
>>>> --- /dev/null
>>>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>> @@ -0,0 +1,181 @@
>>>> +KVM/ARM VGIC Forwarded Physical Interrupts
>>>> +==========================================
>>>> +
>>>> +The KVM/ARM code implements software support for the ARM Generic
>>>> +Interrupt Controller's (GIC's) hardware support for virtualization by
>>>> +allowing software to inject virtual interrupts to a VM, which the guest
>>>> +OS sees as regular interrupts. The code is famously known as the VGIC.
>>>> +
>>>> +Some of these virtual interrupts, however, correspond to physical
>>>> +interrupts from real physical devices. One example could be the
>>>> +architected timer, which itself supports virtualization, and therefore
>>>> +lets a guest OS program the hardware device directly to raise an
>>>> +interrupt at some point in time. When such an interrupt is raised, the
>>>> +host OS initially handles the interrupt and must somehow signal this
>>>> +event as a virtual interrupt to the guest. Another example could be a
>>>> +passthrough device, where the physical interrupts are initially handled
>>>> +by the host, but the device driver for the device lives in the guest OS
>>>> +and KVM must therefore somehow inject a virtual interrupt on behalf of
>>>> +the physical one to the guest OS.
>>>> +
>>>> +These virtual interrupts corresponding to a physical interrupt on the
>>>> +host are called forwarded physical interrupts, but are also sometimes
>>>> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
>>>> +
>>>> +Forwarded physical interrupts are handled slightly differently compared
>>>> +to virtual interrupts generated purely by a software emulated device.
>>>> +
>>>> +
>>>> +The HW bit
>>>> +----------
>>>> +Virtual interrupts are signalled to the guest by programming the List
>>>> +Registers (LRs) on the GIC before running a VCPU. The LR is programmed
>>>> +with the virtual IRQ number and the state of the interrupt (Pending,
>>>> +Active, or Pending+Active). When the guest ACKs and EOIs a virtual
>>>> +interrupt, the LR state moves from Pending to Active, and finally to
>>>> +inactive.
>>>> +
>>>> +The LRs include an extra bit, called the HW bit. When this bit is set,
>>>> +KVM must also program an additional field in the LR, the physical IRQ
>>>> +number, to link the virtual with the physical IRQ.
>>>> +
>>>> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
>>>> +bit, never both at the same time.
>>>> +
>>>> +Setting the HW bit causes the hardware to deactivate the physical
>>>> +interrupt on the physical distributor when the guest deactivates the
>>>> +corresponding virtual interrupt.
>>>> +
>>>> +
>>>> +Forwarded Physical Interrupts Life Cycle
>>>> +----------------------------------------
>>>> +
>>>> +The state of forwarded physical interrupts is managed in the following way:
>>>> +
>>>> + - The physical interrupt is acked by the host, and becomes active on
>>>> + the physical distributor (*).
>>>> + - KVM sets the LR.Pending bit, because this is the only way the GICV
>>>> + interface is going to present it to the guest.
>>>> + - LR.Pending will stay set as long as the guest has not acked the interrupt.
>>>> + - LR.Pending transitions to LR.Active on the guest read of the IAR, as
>>>> + expected.
>>>> + - On guest EOI, the *physical distributor* active bit gets cleared,
>>>> + but the LR.Active is left untouched (set).
>>>
>>> I tried hard in the last week, but couldn't confirm this. Tracing shows
>>> the following pattern over and over (case 1):
>>> (This is the kvm/kvm.git:queue branch from last week, so including the
>>> mapped timer IRQ code. Tests were done on Juno and Midway)
>>>
>>> ...
>>> 229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
>>> 229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
>>> 229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>>> 0xffffffc0004089d8
>>> 229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
>>> ELRSR: 1, dist active: 0, log. active: 1
>>> ....
>>>
>>> My hunch is that the following happens (please correct me if needed!):
>>> First there is an unrelated trap (line 1), then later the guest exits
>>> due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
>>> here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
>>> The host injects the timer IRQ (not shown here) and returns to the
>>> guest. On the next trap (line 3, due to a stage 2 page fault),
>>> vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
>>> GIC actually did deactivate both the LR (state=8, which is inactive,
>>> just the HW bit is still set) _and_ the state on the physical
>>> distributor (dist active=0). This trace_printk is just after entering
>>> the function, so before the code there performs these steps redundantly.
>>> Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
>>> point of view this virtual IRQ cycle is finished.
>>>
>>> The other sequence I see is this one (case 2):
>>>
>>> ....
>>> 231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
>>> 231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>>> 0xffffffc0004089d8
>>> 231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>>> ELRSR: 0, dist active: 1, log. active: 1
>>> 231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
>>> 231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>>> ELRSR: 0, dist active: 0, log. active: 1
>>> ...
>>>
>>> In line 1 the timer fires, the host injects the timer IRQ into the
>>> guest, which exits again in line 2 due to a page fault (may have IRQs
>>> disabled?). The LR dump in line 3 shows that the timer IRQ is still
>>> pending in the LR (state=9) and active on the physical distributor. Now
>>> the code in vgic_sync_hwirq() clears the active state in the physical
>>> distributor (by calling irq_set_irqchip_state()), but leaves the LR
>>> alone (by returning 0 to the caller).
>>> On the next exit (line 4, due to some HW IRQ?) the LR is still the same
>>> (line 5), only that the physical dist state in now inactive (due to us
>>> clearing that explicitly during the last exit).
>> Normally the physical dist state was set active on previous flush, right
>> (done for all mapped IRQs)?
>
> Where is this done? I see that the physical dist state is altered on the
> actual IRQ forwarding, but not on later exits/entries? Do you mean
> kvm_vgic_flush_hwstate() with "flush"?
this is a bug and should be fixed in the 'fixes' patches I sent last
week. We should set active state on every entry to the guest for IRQs
with the HW bit set in either pending or active state.
>
>> So are you sure the IRQ was not actually
>> completed by the guest? As Christoffer mentions the LR active state can
>> remain even if the IRQ was completed.
>
> I was wondering where this behaviour Christoffer mentioned comes from?
>From how I understand the architecture and from talking to Marc.
> Is this an observation, an implementation bug or is this mentioned in
> the spec? Needing to spoon-feed the VGIC by doing it's job sounds a bit
> awkward to me.
What do you mean? How are we spoon-feeding the VGIC?
> I will try to add more tracing to see what is actually happening, trying
> to trace a timer IRQ life cycle more accurately to see what's going on here.
>
By all means, trace through the thing, it would be great to get others
to look at this, but I recommend applying both the fixes I sent and
this v2 timer rework series before doing so, because otherwise things
don't work as I outline in this document.
-Christoffer
More information about the linux-arm-kernel
mailing list