[PATCH v2] ARM: KVM: add irqfd and irq routing support
Eric Auger
eric.auger at linaro.org
Fri Jun 6 00:41:51 PDT 2014
On 06/05/2014 06:08 PM, Christoffer Dall wrote:
> On Thu, Jun 05, 2014 at 05:58:02PM +0200, Eric Auger wrote:
>> On 06/05/2014 04:39 PM, Christoffer Dall wrote:
>>> On Thu, Jun 05, 2014 at 03:15:15PM +0200, Eric Auger wrote:
>>>> On 06/05/2014 12:28 PM, Christoffer Dall wrote:
>>>>> On Mon, Jun 02, 2014 at 09:29:56AM +0200, Eric Auger wrote:
>>>>>> This patch enables irqfd and irq routing on ARM.
>>>>>>
>>>>>> It turns on CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQ_ROUTING
>>>>>>
>>>>>> irqfd framework enables to assign physical IRQs to guests.
>>>>>>
>>>>>> 1) user-side uses KVM_IRQFD VM ioctl to pass KVM a kvm_irqfd struct that
>>>>>> associates a VM, an eventfd, an IRQ number (aka. the GSI). When an actor
>>>>>> signals the eventfd (typically a VFIO platform driver), the irqfd subsystem
>>>>>> injects the specified IRQ into the VM (the "GSI" takes the semantic of a
>>>>>> virtual IRQ for that guest).
>>>>>>
>>>>>> 2) the other use case is user-side does 1) and uses KVM_SET_GSI_ROUTING
>>>>>> VM ioctl to create an association between a VM, a physical IRQ (aka GSI) and
>>>>>> a virtual IRQ (aka irchip.pin). This creates a so-called GSI routing entry.
>>>>>> When someone triggers the eventfd, irqfd handles it but uses the specified
>>>>>> routing and eventually injects irqchip.pin virtual IRQ into the guest. In that
>>>>>> context the GSI takes the semantic of a physical IRQ while the irqchip.pin
>>>>>> takes the semantic of a virtual IRQ.
>>>>>>
>>>>>> in 1) routing is used by irqfd but an identity routing is created by default
>>>>>> making the gsi = irqchip.pin. Note on ARM there is a single interrupt
>>>>>> controller kind, the GIC.
>>>>>>
>>>>>> GSI routing mostly is implemented in generic irqchip.c.
>>>>>> The tiny ARM specific part is directly implemented in the virtual interrupt
>>>>>> controller (vgic.c) as it is done for powerpc for instance. This option was
>>>>>> prefered compared to implementing other #ifdef in irq_comm.c (x86 and ia64).
>>>>>> Hence irq_comm.c is not used at all.
>>>>>>
>>>>>> Routing currently is not used for anything else than irqfd IRQ injection. Only
>>>>>> SPI can be injected. This means the vgic is not totally hidden behind the
>>>>>> irqchip. There are separate discussions on PPI/SGI routing.
>>>>>
>>>>> What do you mean here? Is there an ongoing discussion on the mailing
>>>>> list somewhere?
>>>>
>>>> Hi Christoffer,
>>>>
>>>> Thanks for the review.
>>>>
>>>> I was refering to that thread where it was invoked to put the whole vgic
>>>> behind irqchip:
>>>> http://www.spinics.net/lists/kvm-arm/msg08413.html
>>>>>
>>>>>>
>>>>>> Only level sensitive IRQs are supported (with a registered resampler). As a
>>>>>
>>>>> Is it not trivial to add edge-triggered support in the same go?
>>>> Yes it shouldn't be a problem. It is more a matter of testing.
>>>>>
>>>>>> reminder the resampler is a second eventfd called by irqfd framework when the
>>>>>> virtual IRQ is completed by the guest. This eventfd is supposed to be handled
>>>>>> on user-side
>>>>>>
>>>>>> MSI routing is not supported yet.
>>>>>>
>>>>>> This work was tested with Calxeda Midway xgmac main interrupt (with and without
>>>>>> explicit user routing) with qemu-system-arm and QEMU VFIO platform device.
>>>>>>
>>>>>> changes v1 -> v2:
>>>>>> 2 fixes:
>>>>>> - v1 assumed gsi/irqchip.pin was already incremented by VGIC_NR_PRIVATE_IRQS.
>>>>>> This is now vgic_set_assigned_irq that increments it before injection.
>>>>>> - v2 now handles the case where a pending assigned irq is cleared through
>>>>>> MMIO access. The irq is properly acked allowing the resamplefd handler
>>>>>> to possibly unmask the physical IRQ.
>>>>>>
>>>>>> Signed-off-by: Eric Auger <eric.auger at linaro.org>
>>>>>>
>>>>>> Conflicts:
>>>>>> Documentation/virtual/kvm/api.txt
>>>>>> arch/arm/kvm/Kconfig
>>>>>>
>>>>>> Conflicts:
>>>>>> Documentation/virtual/kvm/api.txt
>>>>>
>>>>> We usually don't include these conflict notices when sending out
>>>>> patches.
>>>>
>>>> OK I will remove them
>>>>>
>>>>>> ---
>>>>>> Documentation/virtual/kvm/api.txt | 4 +-
>>>>>> arch/arm/include/uapi/asm/kvm.h | 8 +++
>>>>>> arch/arm/kvm/Kconfig | 2 +
>>>>>> arch/arm/kvm/Makefile | 1 +
>>>>>> arch/arm/kvm/irq.h | 25 +++++++
>>>>>> virt/kvm/arm/vgic.c | 141 ++++++++++++++++++++++++++++++++++++--
>>>>>> 6 files changed, 174 insertions(+), 7 deletions(-)
>>>>>> create mode 100644 arch/arm/kvm/irq.h
>>>>>>
>>>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>>>> index b4f5365..b376334 100644
>>>>>> --- a/Documentation/virtual/kvm/api.txt
>>>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>>>> @@ -1339,7 +1339,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
>>>>>> 4.52 KVM_SET_GSI_ROUTING
>>>>>>
>>>>>> Capability: KVM_CAP_IRQ_ROUTING
>>>>>> -Architectures: x86 ia64 s390
>>>>>> +Architectures: x86 ia64 s390 arm
>>>>>> Type: vm ioctl
>>>>>> Parameters: struct kvm_irq_routing (in)
>>>>>> Returns: 0 on success, -1 on error
>>>>>> @@ -2126,7 +2126,7 @@ into the hash PTE second double word).
>>>>>> 4.75 KVM_IRQFD
>>>>>>
>>>>>> Capability: KVM_CAP_IRQFD
>>>>>> -Architectures: x86 s390
>>>>>> +Architectures: x86 s390 arm
>>>>>> Type: vm ioctl
>>>>>> Parameters: struct kvm_irqfd (in)
>>>>>> Returns: 0 on success, -1 on error
>>>>>> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
>>>>>> index ef0c878..89b864d 100644
>>>>>> --- a/arch/arm/include/uapi/asm/kvm.h
>>>>>> +++ b/arch/arm/include/uapi/asm/kvm.h
>>>>>> @@ -192,6 +192,14 @@ struct kvm_arch_memory_slot {
>>>>>> /* Highest supported SPI, from VGIC_NR_IRQS */
>>>>>> #define KVM_ARM_IRQ_GIC_MAX 127
>>>>>>
>>>>>> +/* needed by IRQ routing */
>>>>>> +
>>>>>> +/* One single KVM irqchip, ie. the VGIC */
>>>>>> +#define KVM_NR_IRQCHIPS 1
>>>>>> +
>>>>>> +/* virtual interrupt controller input pins (max 480 SPI, 32 SGI/PPI) */
>>>>>> +#define KVM_IRQCHIP_NUM_PINS 256
>>>>>
>>>>> I don't even see how the comment correlates to the define. Hmmm. But
>>>>> since Marc asked you to change this anyhow, maybe this doesn't matter
>>>>> now.
>>>>
>>>> yes you're right. Those were the figures I was able to find in GIC400
>>>> TRM and I was confused by the fact I was not able to find them in the
>>>> code so I eventually put the same value as VGIC_NR_IRQS. I started
>>>> looking at kvm-arm64/vgic-dyn and found this dist nr_irqs but need more
>>>> time to investigate. Nethertheless his KVM_IRQCHIP_NUM_PINS is used in
>>>> generic code (irqchip).
>>>>>
>>>>>> +
>>>>>> /* PSCI interface */
>>>>>> #define KVM_PSCI_FN_BASE 0x95c1ba5e
>>>>>> #define KVM_PSCI_FN(n) (KVM_PSCI_FN_BASE + (n))
>>>>>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
>>>>>> index 4be5bb1..096692c 100644
>>>>>> --- a/arch/arm/kvm/Kconfig
>>>>>> +++ b/arch/arm/kvm/Kconfig
>>>>>> @@ -24,6 +24,7 @@ config KVM
>>>>>> select KVM_MMIO
>>>>>> select KVM_ARM_HOST
>>>>>> depends on ARM_VIRT_EXT && ARM_LPAE && !CPU_BIG_ENDIAN
>>>>>> + select HAVE_KVM_EVENTFD
>>>>>> ---help---
>>>>>> Support hosting virtualized guest machines. You will also
>>>>>> need to select one or more of the processor modules below.
>>>>>> @@ -56,6 +57,7 @@ config KVM_ARM_VGIC
>>>>>> bool "KVM support for Virtual GIC"
>>>>>> depends on KVM_ARM_HOST && OF
>>>>>> select HAVE_KVM_IRQCHIP
>>>>>> + select HAVE_KVM_IRQ_ROUTING
>>>>>> default y
>>>>>> ---help---
>>>>>> Adds support for a hardware assisted, in-kernel GIC emulation.
>>>>>> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
>>>>>> index 789bca9..29de111 100644
>>>>>> --- a/arch/arm/kvm/Makefile
>>>>>> +++ b/arch/arm/kvm/Makefile
>>>>>> @@ -21,4 +21,5 @@ obj-y += kvm-arm.o init.o interrupts.o
>>>>>> obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
>>>>>> obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
>>>>>> obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
>>>>>> +obj-$(CONFIG_HAVE_KVM_EVENTFD) += $(KVM)/eventfd.o $(KVM)/irqchip.o
>>>>>> obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
>>>>>> diff --git a/arch/arm/kvm/irq.h b/arch/arm/kvm/irq.h
>>>>>> new file mode 100644
>>>>>> index 0000000..4d6fcc6
>>>>>> --- /dev/null
>>>>>> +++ b/arch/arm/kvm/irq.h
>>>>>> @@ -0,0 +1,25 @@
>>>>>> +/*
>>>>>> + * Copyright (C) 2014, STMicroelectronics
>>>>>> + * Authors: Eric Auger <eric.auger at st.com>
>>>>>
>>>>> please use the Linaro copyright notice for this. You can add your
>>>>> @st.com e-mail in addition to the Linaro one in case you want that for
>>>>> long-term support.
>>>>
>>>> OK I will do
>>>>>
>>>>>> + *
>>>>>> + * This program is free software; you can redistribute it and/or modify
>>>>>> + * it under the terms of the GNU General Public License, version 2, as
>>>>>> + * published by the Free Software Foundation.
>>>>>> + *
>>>>>> + * This program is distributed in the hope that it will be useful,
>>>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>>>>> + * GNU General Public License for more details.
>>>>>> + *
>>>>>> + */
>>>>>> +
>>>>>> +#ifndef __IRQ_H
>>>>>> +#define __IRQ_H
>>>>>> +
>>>>>> +#include <linux/kvm_host.h>
>>>>>> +/*
>>>>>> + * Placeholder for irqchip and irq/msi routing declarations
>>>>>> + * included in irqchip.c
>>>>>> + */
>>>>>
>>>>> But none needed now?
>>>>
>>>> Yes nothing for the time being.
>>>>>
>>>>>> +
>>>>>> +#endif
>>>>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>>>>> index 56ff9be..39afa0d 100644
>>>>>> --- a/virt/kvm/arm/vgic.c
>>>>>> +++ b/virt/kvm/arm/vgic.c
>>>>>> @@ -93,6 +93,9 @@ static struct device_node *vgic_node;
>>>>>> #define ACCESS_WRITE_VALUE (3 << 1)
>>>>>> #define ACCESS_WRITE_MASK(x) ((x) & (3 << 1))
>>>>>>
>>>>>> +static struct kvm_irq_routing_entry identity_table[VGIC_NR_IRQS];
>>>>>> +static int set_default_routing_table(struct kvm *kvm);
>>>>>> +
>>>>>> static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>>>>>> static void vgic_update_state(struct kvm *kvm);
>>>>>> static void vgic_kick_vcpus(struct kvm *kvm);
>>>>>> @@ -408,11 +411,27 @@ static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
>>>>>> struct kvm_exit_mmio *mmio,
>>>>>> phys_addr_t offset)
>>>>>> {
>>>>>> - u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
>>>>>> + struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>>>>>> + unsigned int i;
>>>>>> + bool is_assigned_irq;
>>>>>> + DECLARE_BITMAP(old, VGIC_NR_SHARED_IRQS);
>>>>>> + DECLARE_BITMAP(diff, VGIC_NR_SHARED_IRQS);
>>>>>> + unsigned long *pending =
>>>>>> + vgic_bitmap_get_shared_map(&dist->irq_state);
>>>>>> + u32 *reg;
>>>>>
>>>>> please add a blank line between your declarations and the actual code.
>>>>
>>>> OK
>>>>>
>>>>>> + bitmap_copy(old, pending, VGIC_NR_SHARED_IRQS);
>>>>>> + reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
>>>>>> vcpu->vcpu_id, offset);
>>>>>> vgic_reg_access(mmio, reg, offset,
>>>>>> ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
>>>>>> if (mmio->is_write) {
>>>>>> + pending = vgic_bitmap_get_shared_map(&dist->irq_state);
>>>>>> + bitmap_xor(diff, old, pending, VGIC_NR_SHARED_IRQS);
>>>>>> + for_each_set_bit(i, diff, VGIC_NR_SHARED_IRQS) {
>>>>>> + is_assigned_irq = kvm_irq_has_notifier(vcpu->kvm, 0, i);
>>>>>> + if (is_assigned_irq)
>>>>>> + kvm_notify_acked_irq(vcpu->kvm, 0, i);
>>>>>> + }
>>>>>
>>>>> As Mark pointed out, just copy the vgic reg value and do a simple xor on
>>>>> the two instead, and factor out the two lines that check for
>>>>> is_assigned_irq and calles notify_acked so that you can give a small
>>>>> static function a semantically meaningful name and call that from both
>>>>> this function and the process_maintenance function.
>>>>
>>>> Yes I corrected this after looking into more details at the register
>>>> semantic.
>>>>>
>>>>> That being said, this whole thing feels a bit weird, I'll comment on the
>>>>> other thread.
>>>> OK
>>>>>
>>>>>> vgic_update_state(vcpu->kvm);
>>>>>> return true;
>>>>>> }
>>>>>> @@ -1172,6 +1191,8 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>>>> {
>>>>>> struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>>>>>> bool level_pending = false;
>>>>>> + struct kvm *kvm;
>>>>>> + int is_assigned_irq;
>>>>>>
>>>>>> kvm_debug("MISR = %08x\n", vgic_cpu->vgic_misr);
>>>>>>
>>>>>> @@ -1189,12 +1210,23 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>>>> vgic_irq_clear_active(vcpu, irq);
>>>>>> vgic_cpu->vgic_lr[lr] &= ~GICH_LR_EOI;
>>>>>>
>>>>>> + kvm = vcpu->kvm;
>>>>>> + is_assigned_irq =
>>>>>> + kvm_irq_has_notifier(kvm, 0, irq-VGIC_NR_PRIVATE_IRQS);
>>>>>
>>>>> I think the preferred style is to keep the function call name on the
>>>>> same line as the assignment, and the do a line break on the parameter
>>>>> that doesn't fit in the line width and align that to the opening
>>>>> parenthesis on the funciton call. At least I prefer it that way.
>>>>
>>>> OK I will change this.
>>>>>
>>>>> also spaces around the '-', please.
>>>>>
>>>>>> /* Any additional pending interrupt? */
>>>>>
>>>>> This comment seems to only aplly for non-assigned IRQs now, right?
>>>>
>>>> yes it does
>>>>>
>>>
>>> so move the comment in the else-clause, if it's an assigned irq then
>>> that's not what you're handling here.
>>
>> correct
>>
>> But come to think of it, don't we
>>> need to check if the line is still high?
>>
>> by construction it should not be possible since the physical IRQ is
>> masked by the VFIO driver.
>
> But this mechanism is not necessarily tied to VFIO is it?
>
Yes that's correct. but it is tied to eventfd mechanism. I need to
further check what it induces as constraints. vhost uses it. Would be
good to put this in place yo test genericity of next versions.
>>>
>>>>>> - if (vgic_dist_irq_is_pending(vcpu, irq)) {
>>>>>> - vgic_cpu_irq_set(vcpu, irq);
>>>>>> - level_pending = true;
>>>>>> - } else {
>>>>>> + if (is_assigned_irq) {
>>>>>> vgic_cpu_irq_clear(vcpu, irq);
>>>>>> + kvm_debug("EOI irqchip routed vIRQ %d\n", irq);
>>>>>> + kvm_notify_acked_irq(kvm, 0,
>>>>>> + irq-VGIC_NR_PRIVATE_IRQS);
>>>>>
>>>>> spaces around the '-', please.
>>>> OK
>>>>>
>>>>>> + vgic_dist_irq_clear(vcpu, irq);
>>>>>> + } else {
>>>>>> + if (vgic_dist_irq_is_pending(vcpu, irq)) {
>>>>>> + vgic_cpu_irq_set(vcpu, irq);
>>>>>> + level_pending = true;
>>>>>> + } else {
>>>>>> + vgic_cpu_irq_clear(vcpu, irq);
>>>>>> + }
>>>>>> }
>>>>>>
>>>>>> /*
>>>>>> @@ -1627,6 +1659,8 @@ int kvm_vgic_create(struct kvm *kvm)
>>>>>> kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
>>>>>> kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
>>>>>>
>>>>>> + set_default_routing_table(kvm);
>>>>>> +
>>>>>> out_unlock:
>>>>>> for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
>>>>>> vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
>>>>>> @@ -2017,3 +2051,100 @@ struct kvm_device_ops kvm_arm_vgic_v2_ops = {
>>>>>> .get_attr = vgic_get_attr,
>>>>>> .has_attr = vgic_has_attr,
>>>>>> };
>>>>>> +
>>>>>> +
>>>>>> +/*
>>>>>> + * set up a default identity routing table
>>>>>> + * The user-side can further change the routing table using
>>>>>> + * KVM_SET_GSI_ROUTING VM ioctl
>>>>>> + */
>>>>>> +
>>>>>
>>>>> don't add this whitespace between comments and a function declaration,
>>>>> please check the entire patch for this.
>>>> OK
>>>>>
>>>>>> +static int set_default_routing_table(struct kvm *kvm)
>>>>>> +{
>>>>>> + struct kvm_irq_routing_entry;
>>>>>> + int i;
>>>>>> + for (i = 0; i < VGIC_NR_IRQS; i++) {
>>>>>> + identity_table[i].gsi = i;
>>>>>> + identity_table[i].type = KVM_IRQ_ROUTING_IRQCHIP;
>>>>>> + identity_table[i].u.irqchip.irqchip = 0;
>>>>>> + identity_table[i].u.irqchip.pin = i;
>>>>>> + }
>>>>>> + return kvm_set_irq_routing(kvm, identity_table,
>>>>>> + ARRAY_SIZE(identity_table), 0);
>>>>>
>>>>> is identity table used after this stage? If not, could you not
>>>>> dynamically allocate it and free it after use so we don't waste this
>>>>> staic allocation of memory in the kernel?
>>>>
>>>> yes this definitively can be optimized.
>>>>>
>>>>>> +}
>>>>>> +
>>>>>> +
>>>>>> +/*
>>>>>> + * Functions needed for GSI routing (used by irqchip.c)
>>>>>> + * implemented in irq_comm.c for x86 and ia64
>>>>>> + * in architecture specific files for some other archictures (powerpc)
>>>>>> + */
>>>>>
>>>>> Hmmm, this comment seems rather pointless in this file. If you want to
>>>>> describe what this function does, then just document this functionality
>>>>> and the parameters/return value, i.e.
>>>>>
>>>>> vgic_set_assigned_irq - set state of IRQs driven by irqfd
>>>>>
>>>>> When an IRQ is raised or lowered.... blah blah blah blah.
>>>>>
>>>>> @e: the routing entry describing...
>>>>> @kvm: the kvm struct
>>>>> and so on.
>>>>
>>>>>
>>>>> return 0 on success, -error on errors.
>>>>>
>>>>
>>>> OK
>>>>>> +
>>>>>> +static int vgic_set_assigned_irq(struct kvm_kernel_irq_routing_entry *e,
>>>>>> + struct kvm *kvm, int irq_source_id, int level,
>>>>>> + bool line_status)
>>>>>> +{
>>>>>> + unsigned int spi = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;
>>>>>
>>>>> I noticed this in the changelogs too, where's the rationale/API
>>>>> documentaiton for the use of irqchip.pin and its semantics on ARM?
>>>>>
>>>>> Should we add something in Documentation/virtual/kvm/api.txt ?
>>>>
>>>> - in 4.24 KVM_CREATE_IRQCHIP I might add that similarly to s390 a dummy
>>>> identity table is created.
>>>>
>>>> - KVM_CAP_IRQFD says "kvm_irqfd.gsi specifies the irqchip pin toggled by
>>>> this event. When an event is triggered on the eventfd, an interrupt is
>>>> injected into the guest using the specified gsi pin"
>>>>
>>>> Assuming the standard use case is to use an identity/dummy GSI table the
>>>> irqchip.pin still remains the "irqchip pin" toggled on eventfd.
>>>>
>>>> By the way I might remove the mention to the use case where the gsi !=
>>>> irqchip.pin.
>>>>
>>>> Now when reading 4.25 KVM_IRQ_LINE, it is said that it is used to inject
>>>> a GSI as well.
>>>>
>>>> Now on ARM The GSI has the following content.
>>>>
>>>> bits: | 31 ... 24 | 23 ... 16 | 15 ... 0 |
>>>> field: | irq_type | vcpu_index | irq_id |
>>>>
>>>> As such that's true that currently there is an inconsistency.
>>>>
>>>> Currently my GSI == spi number whereas the GSI as injected by
>>>> KVM_IRQ_LINE has the above format.
>>>
>>> But the ARM use of KVM_IRQFD is not clearly documented.
>>>
>>> I think this patch needs to include adding arm to the "Architectures"
>>> list in api.txt and clearly document what the semantics of the fields of
>>> the struct kvm_irqfd are.
>>
>> Yes indeed, this deserves such clarification, all the more so there is
>> the above inconsistency. We come back to the discussion about relevance
>> of routing other things than SPI actually. use case?
>>>
>>>>>
>>>>>> +
>>>>>> + if (irq_source_id == KVM_USERSPACE_IRQ_SOURCE_ID) {
>>>>>> + /*
>>>>>> + * This path is not tested yet,
>>>>>> + * only irqchip with resampler was exercised
>>>>>> + */
>>>>>> + kvm_vgic_inject_irq(kvm, 0, spi, level);
>>>>>
>>>>> hmmm, stuff like this definitely makes this patch an RFC.
>>>>
>>>> Yes I acknowledge I shall step back to RFC. I was a bit eager.
>>>
>>> The alternative is to return an error and put a big fat comment saying
>>> this is NOT IMPLEMENTED yet.
>>
>> actually this KVM_USERSPACE_IRQ_SOURCE_ID path is entered when injecting
>> IRQs without resamplerfd (is edge sensitive ones). As such we can link
>> this issue to your above related comment.
>>
>> Anyway I will move it to RFC since there are quite a lot of open points,
>> especially the next one.
>>>
>>>>>
>>>>>> + } else if (irq_source_id == KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID) {
>>>>>> + if (level == 1) {
>>>>>
>>>>> This seems very wrong. What if an external device raises a
>>>>> level-triggered IRQ, but then lowers it again, without the guest was
>>>>> ever running, now you have to wait until the guest sees the (now
>>>>> inactive) interrupt and EOIs it before the interrupt is lowered on the
>>>>> vgic?
>>>> Hum I am bit confused here. when you enter that code, this means an
>>>> irqfd was triggered. This irqfd was registered by some user code that is
>>>> supposed to be alive and do the proper action to complete the IRQ. in my
>>>> case the xgmac toggles down the IRQ if anyone resets the IRQ status
>>>> register? this action is done in the xgmac guest ISR. A device
>>>> reset/error? might provoke the IRQ pin toggling done. Do you see any
>>>> other events?
>>>
>>> Sure, ok, forget about the 'guest wasn't running', but if the guest ISR
>>> is run with guest interrupts disabled and resets the xgmac device (or
>>> imagine some other device that just lowers the level triggered interrupt
>>> after some timeout), then how does the current code handle this?
>> yes I aknowledge current implementation relies on 2 assumptions:
>> - either the EOI happens or
>> - the pending IRQ is cleared through ICPENDRn access
>>
>> and in case of reset typically we are in trouble to clear the distributor.
>>
>> But I currently do not see any way to detect the device IRQ is toggled
>> down without trapping something.
>
> I guess that's hard without trapping or probing the VFIO driver, but the
> current scheme just seems to fragile and seems to be built around a very
> specific behavior of your guest and host, and not supporting a generic
> solution.
>
> Either we will have to trap MMIO to know what's going on, or perhaps
> probe the VFIO state when the interrupt is enabled on the vgic by the
> guest.
>
agreed.
>>>
>>>>>
>>>>>> + kvm_debug("Inject irqchip routed vIRQ %d\n",
>>>>>> + e->irqchip.pin);
>>>>>> + kvm_vgic_inject_irq(kvm, 0, spi, level);
>>>>>> + /*
>>>>>> + * toggling down vIRQ wire is directly handled in
>>>>>> + * process_maintenance for this reason:
>>>>>> + * irqfd_resampler_ack is called in
>>>>>> + * process_maintenance which holds the dist lock.
>>>>>> + * irqfd_resampler_ack calls kvm_set_irq
>>>>>> + * which ends_up calling kvm_vgic_inject_irq.
>>>>>> + * This later attempts to take the lock -> deadlock!
>>>>>> + */
>>>>>
>>>>> Not sure I understand this comment. What are we trying to achieve, are
>>>>> we using some sort of a workaround to avoid a deadlock?
>>>>
>>>> What I wanted to point out here is I would have prefered to handle both
>>>> levels 0 and 1 in a symetrical manner. irqfd_resampler_ack (in eventfd)
>>>> is calling kvm_set_irq with level 0. This would be the prefered way to
>>>> toggle down the SPI at GIC input instead of doing this in
>>>> process_maintenance in a dirty manner. However this does work because
>>>> irqfd_resampler_ack is called in process_maintenance (the place where
>>>> the EOI is analyzed). process_maintenance holds the dist lock and would
>>>> eventually call kvm_vgic_inject_irq which also attempts to take the lock.
>>>>
>>>
>>> I'm afraid that's too much of a hack. There's an external mechanism to
>>> set an interrupt line to active (level=1) or inactive (level=0) and we
>>> must support both.
>>
>>> The fact that vgic_process_maintenance() can set the interrupt line to
>>> inactive is just something we exploit to properly handle level-triggered
>>> interrupts, but the main API to the VGIC must absolutely be supported.
>>>
>>> Am I completely wrong here?
>>
>> Well I am not sure what you call here the VGIC API? Is it
>> kvm_vgic_inject_irq? I agree with you on the fact It would be cleaner to
>> use this later in both edge transitions.
>
> Yes, the "set" function pointer, which eventually calls
> vgic_set_assigned_irq(), is your API. If that API carries semantics
> that you can raise AND lower the irq through that API, then we need to
> support that I think.
agreed too
>
> Thanks,
> -Christoffer
>
More information about the linux-arm-kernel
mailing list