[PATCH v2] ARM: KVM: add irqfd and irq routing support

Eric Auger eric.auger at linaro.org
Fri Jun 6 00:41:51 PDT 2014


On 06/05/2014 06:08 PM, Christoffer Dall wrote:
> On Thu, Jun 05, 2014 at 05:58:02PM +0200, Eric Auger wrote:
>> On 06/05/2014 04:39 PM, Christoffer Dall wrote:
>>> On Thu, Jun 05, 2014 at 03:15:15PM +0200, Eric Auger wrote:
>>>> On 06/05/2014 12:28 PM, Christoffer Dall wrote:
>>>>> On Mon, Jun 02, 2014 at 09:29:56AM +0200, Eric Auger wrote:
>>>>>> This patch enables irqfd and irq routing on ARM.
>>>>>>
>>>>>> It turns on CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQ_ROUTING
>>>>>>
>>>>>> irqfd framework enables to assign physical IRQs to guests.
>>>>>>
>>>>>> 1) user-side uses KVM_IRQFD VM ioctl to pass KVM a kvm_irqfd struct that
>>>>>> associates a VM, an eventfd, an IRQ number (aka. the GSI). When an actor
>>>>>> signals the eventfd (typically a VFIO platform driver), the irqfd subsystem
>>>>>> injects the specified IRQ into the VM (the "GSI" takes the semantic of a
>>>>>> virtual IRQ for that guest).
>>>>>>
>>>>>> 2) the other use case is user-side does 1) and uses KVM_SET_GSI_ROUTING
>>>>>> VM ioctl to create an association between a VM, a physical IRQ (aka GSI) and
>>>>>> a virtual IRQ (aka irchip.pin).  This creates a so-called GSI routing entry.
>>>>>> When someone triggers the eventfd, irqfd handles it but uses the specified
>>>>>> routing and eventually injects irqchip.pin virtual IRQ into the guest. In that
>>>>>> context the GSI takes the semantic of a physical IRQ while the irqchip.pin
>>>>>> takes the semantic of a virtual IRQ.
>>>>>>
>>>>>> in 1) routing is used by irqfd but an identity routing is created by default
>>>>>> making the gsi = irqchip.pin. Note on ARM there is a single interrupt
>>>>>> controller kind, the GIC.
>>>>>>
>>>>>> GSI routing mostly is implemented in generic irqchip.c.
>>>>>> The tiny ARM specific part is directly implemented in the virtual interrupt
>>>>>> controller (vgic.c) as it is done for powerpc for instance. This option was
>>>>>> prefered compared to implementing other #ifdef in irq_comm.c (x86 and ia64).
>>>>>> Hence irq_comm.c is not used at all.
>>>>>>
>>>>>> Routing currently is not used for anything else than irqfd IRQ injection. Only
>>>>>> SPI can be injected. This means the vgic is not totally hidden behind the
>>>>>> irqchip. There are separate discussions on PPI/SGI routing.
>>>>>
>>>>> What do you mean here?  Is there an ongoing discussion on the mailing
>>>>> list somewhere?
>>>>
>>>> Hi Christoffer,
>>>>
>>>> Thanks for the review.
>>>>
>>>> I was refering to that thread where it was invoked to put the whole vgic
>>>> behind irqchip:
>>>> http://www.spinics.net/lists/kvm-arm/msg08413.html
>>>>>
>>>>>>
>>>>>> Only level sensitive IRQs are supported (with a registered resampler). As a
>>>>>
>>>>> Is it not trivial to add edge-triggered support in the same go?
>>>> Yes it shouldn't be a problem. It is more a matter of testing.
>>>>>
>>>>>> reminder the resampler is a second eventfd called by irqfd framework when the
>>>>>> virtual IRQ is completed by the guest. This eventfd is supposed to be handled
>>>>>> on user-side
>>>>>>
>>>>>> MSI routing is not supported yet.
>>>>>>
>>>>>> This work was tested with Calxeda Midway xgmac main interrupt (with and without
>>>>>> explicit user routing) with qemu-system-arm and QEMU VFIO platform device.
>>>>>>
>>>>>> changes v1 -> v2:
>>>>>> 2 fixes:
>>>>>> - v1 assumed gsi/irqchip.pin was already incremented by VGIC_NR_PRIVATE_IRQS.
>>>>>>   This is now vgic_set_assigned_irq that increments it before injection.
>>>>>> - v2 now handles the case where a pending assigned irq is cleared through
>>>>>>   MMIO access. The irq is properly acked allowing the resamplefd handler
>>>>>>   to possibly unmask the physical IRQ.
>>>>>>
>>>>>> Signed-off-by: Eric Auger <eric.auger at linaro.org>
>>>>>>
>>>>>> Conflicts:
>>>>>> 	Documentation/virtual/kvm/api.txt
>>>>>> 	arch/arm/kvm/Kconfig
>>>>>>
>>>>>> Conflicts:
>>>>>> 	Documentation/virtual/kvm/api.txt
>>>>>
>>>>> We usually don't include these conflict notices when sending out
>>>>> patches.
>>>>
>>>> OK I will remove them
>>>>>
>>>>>> ---
>>>>>>  Documentation/virtual/kvm/api.txt |   4 +-
>>>>>>  arch/arm/include/uapi/asm/kvm.h   |   8 +++
>>>>>>  arch/arm/kvm/Kconfig              |   2 +
>>>>>>  arch/arm/kvm/Makefile             |   1 +
>>>>>>  arch/arm/kvm/irq.h                |  25 +++++++
>>>>>>  virt/kvm/arm/vgic.c               | 141 ++++++++++++++++++++++++++++++++++++--
>>>>>>  6 files changed, 174 insertions(+), 7 deletions(-)
>>>>>>  create mode 100644 arch/arm/kvm/irq.h
>>>>>>
>>>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>>>> index b4f5365..b376334 100644
>>>>>> --- a/Documentation/virtual/kvm/api.txt
>>>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>>>> @@ -1339,7 +1339,7 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
>>>>>>  4.52 KVM_SET_GSI_ROUTING
>>>>>>  
>>>>>>  Capability: KVM_CAP_IRQ_ROUTING
>>>>>> -Architectures: x86 ia64 s390
>>>>>> +Architectures: x86 ia64 s390 arm
>>>>>>  Type: vm ioctl
>>>>>>  Parameters: struct kvm_irq_routing (in)
>>>>>>  Returns: 0 on success, -1 on error
>>>>>> @@ -2126,7 +2126,7 @@ into the hash PTE second double word).
>>>>>>  4.75 KVM_IRQFD
>>>>>>  
>>>>>>  Capability: KVM_CAP_IRQFD
>>>>>> -Architectures: x86 s390
>>>>>> +Architectures: x86 s390 arm
>>>>>>  Type: vm ioctl
>>>>>>  Parameters: struct kvm_irqfd (in)
>>>>>>  Returns: 0 on success, -1 on error
>>>>>> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
>>>>>> index ef0c878..89b864d 100644
>>>>>> --- a/arch/arm/include/uapi/asm/kvm.h
>>>>>> +++ b/arch/arm/include/uapi/asm/kvm.h
>>>>>> @@ -192,6 +192,14 @@ struct kvm_arch_memory_slot {
>>>>>>  /* Highest supported SPI, from VGIC_NR_IRQS */
>>>>>>  #define KVM_ARM_IRQ_GIC_MAX		127
>>>>>>  
>>>>>> +/* needed by IRQ routing */
>>>>>> +
>>>>>> +/* One single KVM irqchip, ie. the VGIC */
>>>>>> +#define KVM_NR_IRQCHIPS          1
>>>>>> +
>>>>>> +/* virtual interrupt controller input pins (max 480 SPI, 32 SGI/PPI) */
>>>>>> +#define KVM_IRQCHIP_NUM_PINS     256
>>>>>
>>>>> I don't even see how the comment correlates to the define.  Hmmm.  But
>>>>> since Marc asked you to change this anyhow, maybe this doesn't matter
>>>>> now.
>>>>
>>>> yes you're right. Those were the figures I was able to find in GIC400
>>>> TRM and I was confused by the fact I was not able to find them in the
>>>> code so I eventually put the same value as VGIC_NR_IRQS. I started
>>>> looking at kvm-arm64/vgic-dyn and found this dist nr_irqs but need more
>>>> time to investigate. Nethertheless his KVM_IRQCHIP_NUM_PINS is used in
>>>> generic code (irqchip).
>>>>>
>>>>>> +
>>>>>>  /* PSCI interface */
>>>>>>  #define KVM_PSCI_FN_BASE		0x95c1ba5e
>>>>>>  #define KVM_PSCI_FN(n)			(KVM_PSCI_FN_BASE + (n))
>>>>>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
>>>>>> index 4be5bb1..096692c 100644
>>>>>> --- a/arch/arm/kvm/Kconfig
>>>>>> +++ b/arch/arm/kvm/Kconfig
>>>>>> @@ -24,6 +24,7 @@ config KVM
>>>>>>  	select KVM_MMIO
>>>>>>  	select KVM_ARM_HOST
>>>>>>  	depends on ARM_VIRT_EXT && ARM_LPAE && !CPU_BIG_ENDIAN
>>>>>> +	select HAVE_KVM_EVENTFD
>>>>>>  	---help---
>>>>>>  	  Support hosting virtualized guest machines. You will also
>>>>>>  	  need to select one or more of the processor modules below.
>>>>>> @@ -56,6 +57,7 @@ config KVM_ARM_VGIC
>>>>>>  	bool "KVM support for Virtual GIC"
>>>>>>  	depends on KVM_ARM_HOST && OF
>>>>>>  	select HAVE_KVM_IRQCHIP
>>>>>> +	select HAVE_KVM_IRQ_ROUTING
>>>>>>  	default y
>>>>>>  	---help---
>>>>>>  	  Adds support for a hardware assisted, in-kernel GIC emulation.
>>>>>> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
>>>>>> index 789bca9..29de111 100644
>>>>>> --- a/arch/arm/kvm/Makefile
>>>>>> +++ b/arch/arm/kvm/Makefile
>>>>>> @@ -21,4 +21,5 @@ obj-y += kvm-arm.o init.o interrupts.o
>>>>>>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
>>>>>>  obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
>>>>>>  obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
>>>>>> +obj-$(CONFIG_HAVE_KVM_EVENTFD) += $(KVM)/eventfd.o $(KVM)/irqchip.o
>>>>>>  obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
>>>>>> diff --git a/arch/arm/kvm/irq.h b/arch/arm/kvm/irq.h
>>>>>> new file mode 100644
>>>>>> index 0000000..4d6fcc6
>>>>>> --- /dev/null
>>>>>> +++ b/arch/arm/kvm/irq.h
>>>>>> @@ -0,0 +1,25 @@
>>>>>> +/*
>>>>>> + * Copyright (C) 2014, STMicroelectronics
>>>>>> + * Authors: Eric Auger <eric.auger at st.com>
>>>>>
>>>>> please use the Linaro copyright notice for this.  You can add your
>>>>> @st.com e-mail in addition to the Linaro one in case you want that for
>>>>> long-term support.
>>>>
>>>> OK I will do
>>>>>
>>>>>> + *
>>>>>> + * This program is free software; you can redistribute it and/or modify
>>>>>> + * it under the terms of the GNU General Public License, version 2, as
>>>>>> + * published by the Free Software Foundation.
>>>>>> + *
>>>>>> + * This program is distributed in the hope that it will be useful,
>>>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>>>> + * GNU General Public License for more details.
>>>>>> + *
>>>>>> + */
>>>>>> +
>>>>>> +#ifndef __IRQ_H
>>>>>> +#define __IRQ_H
>>>>>> +
>>>>>> +#include <linux/kvm_host.h>
>>>>>> +/*
>>>>>> + * Placeholder for irqchip and irq/msi routing declarations
>>>>>> + * included in irqchip.c
>>>>>> + */
>>>>>
>>>>> But none needed now?
>>>>
>>>> Yes nothing for the time being.
>>>>>
>>>>>> +
>>>>>> +#endif
>>>>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>>>>> index 56ff9be..39afa0d 100644
>>>>>> --- a/virt/kvm/arm/vgic.c
>>>>>> +++ b/virt/kvm/arm/vgic.c
>>>>>> @@ -93,6 +93,9 @@ static struct device_node *vgic_node;
>>>>>>  #define ACCESS_WRITE_VALUE	(3 << 1)
>>>>>>  #define ACCESS_WRITE_MASK(x)	((x) & (3 << 1))
>>>>>>  
>>>>>> +static struct kvm_irq_routing_entry identity_table[VGIC_NR_IRQS];
>>>>>> +static int set_default_routing_table(struct kvm *kvm);
>>>>>> +
>>>>>>  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
>>>>>>  static void vgic_update_state(struct kvm *kvm);
>>>>>>  static void vgic_kick_vcpus(struct kvm *kvm);
>>>>>> @@ -408,11 +411,27 @@ static bool handle_mmio_clear_pending_reg(struct kvm_vcpu *vcpu,
>>>>>>  					  struct kvm_exit_mmio *mmio,
>>>>>>  					  phys_addr_t offset)
>>>>>>  {
>>>>>> -	u32 *reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
>>>>>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>>>>>> +	unsigned int i;
>>>>>> +	bool is_assigned_irq;
>>>>>> +	DECLARE_BITMAP(old, VGIC_NR_SHARED_IRQS);
>>>>>> +	DECLARE_BITMAP(diff, VGIC_NR_SHARED_IRQS);
>>>>>> +	unsigned long *pending =
>>>>>> +		vgic_bitmap_get_shared_map(&dist->irq_state);
>>>>>> +	u32 *reg;
>>>>>
>>>>> please add a blank line between your declarations and the actual code.
>>>>
>>>> OK
>>>>>
>>>>>> +	bitmap_copy(old, pending, VGIC_NR_SHARED_IRQS);
>>>>>> +	reg = vgic_bitmap_get_reg(&vcpu->kvm->arch.vgic.irq_state,
>>>>>>  				       vcpu->vcpu_id, offset);
>>>>>>  	vgic_reg_access(mmio, reg, offset,
>>>>>>  			ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
>>>>>>  	if (mmio->is_write) {
>>>>>> +		pending = vgic_bitmap_get_shared_map(&dist->irq_state);
>>>>>> +		bitmap_xor(diff, old, pending, VGIC_NR_SHARED_IRQS);
>>>>>> +		for_each_set_bit(i, diff, VGIC_NR_SHARED_IRQS) {
>>>>>> +			is_assigned_irq = kvm_irq_has_notifier(vcpu->kvm, 0, i);
>>>>>> +			if (is_assigned_irq)
>>>>>> +				kvm_notify_acked_irq(vcpu->kvm, 0, i);
>>>>>> +		}
>>>>>
>>>>> As Mark pointed out, just copy the vgic reg value and do a simple xor on
>>>>> the two instead, and factor out the two lines that check for
>>>>> is_assigned_irq and calles notify_acked so that you can give a small
>>>>> static function a semantically meaningful name and call that from both
>>>>> this function and the process_maintenance function.
>>>>
>>>> Yes I corrected this after looking into more details at the register
>>>> semantic.
>>>>>
>>>>> That being said, this whole thing feels a bit weird, I'll comment on the
>>>>> other thread.
>>>> OK
>>>>>
>>>>>>  		vgic_update_state(vcpu->kvm);
>>>>>>  		return true;
>>>>>>  	}
>>>>>> @@ -1172,6 +1191,8 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>>>>  {
>>>>>>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>>>>>>  	bool level_pending = false;
>>>>>> +	struct kvm *kvm;
>>>>>> +	int is_assigned_irq;
>>>>>>  
>>>>>>  	kvm_debug("MISR = %08x\n", vgic_cpu->vgic_misr);
>>>>>>  
>>>>>> @@ -1189,12 +1210,23 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>>>>  			vgic_irq_clear_active(vcpu, irq);
>>>>>>  			vgic_cpu->vgic_lr[lr] &= ~GICH_LR_EOI;
>>>>>>  
>>>>>> +			kvm = vcpu->kvm;
>>>>>> +			is_assigned_irq =
>>>>>> +				kvm_irq_has_notifier(kvm, 0, irq-VGIC_NR_PRIVATE_IRQS);
>>>>>
>>>>> I think the preferred style is to keep the function call name on the
>>>>> same line as the assignment, and the do a line break on the parameter
>>>>> that doesn't fit in the line width and align that to the opening
>>>>> parenthesis on the funciton call.  At least I prefer it that way.
>>>>
>>>> OK I will change this.
>>>>>
>>>>> also spaces around the '-', please.
>>>>>
>>>>>>  			/* Any additional pending interrupt? */
>>>>>
>>>>> This comment seems to only aplly for non-assigned IRQs now, right?
>>>>
>>>> yes it does
>>>>>
>>>
>>> so move the comment in the else-clause, if it's an assigned irq then
>>> that's not what you're handling here. 
>>
>> correct
>>
>>  But come to think of it, don't we
>>> need to check if the line is still high?
>>
>> by construction it should not be possible since the physical IRQ is
>> masked by the VFIO driver.
> 
> But this mechanism is not necessarily tied to VFIO is it?
> 

Yes that's correct. but it is tied to eventfd mechanism. I need to
further check what it induces as constraints. vhost uses it. Would be
good to put this in place yo test genericity of next versions.
>>>
>>>>>> -			if (vgic_dist_irq_is_pending(vcpu, irq)) {
>>>>>> -				vgic_cpu_irq_set(vcpu, irq);
>>>>>> -				level_pending = true;
>>>>>> -			} else {
>>>>>> +			if (is_assigned_irq) {
>>>>>>  				vgic_cpu_irq_clear(vcpu, irq);
>>>>>> +				kvm_debug("EOI irqchip routed vIRQ %d\n", irq);
>>>>>> +				kvm_notify_acked_irq(kvm, 0,
>>>>>> +					irq-VGIC_NR_PRIVATE_IRQS);
>>>>>
>>>>> spaces around the '-', please.
>>>> OK
>>>>>
>>>>>> +				vgic_dist_irq_clear(vcpu, irq);
>>>>>> +			} else {
>>>>>> +				if (vgic_dist_irq_is_pending(vcpu, irq)) {
>>>>>> +					vgic_cpu_irq_set(vcpu, irq);
>>>>>> +					level_pending = true;
>>>>>> +				} else {
>>>>>> +					vgic_cpu_irq_clear(vcpu, irq);
>>>>>> +				}
>>>>>>  			}
>>>>>>  
>>>>>>  			/*
>>>>>> @@ -1627,6 +1659,8 @@ int kvm_vgic_create(struct kvm *kvm)
>>>>>>  	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
>>>>>>  	kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
>>>>>>  
>>>>>> +	set_default_routing_table(kvm);
>>>>>> +
>>>>>>  out_unlock:
>>>>>>  	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
>>>>>>  		vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
>>>>>> @@ -2017,3 +2051,100 @@ struct kvm_device_ops kvm_arm_vgic_v2_ops = {
>>>>>>  	.get_attr = vgic_get_attr,
>>>>>>  	.has_attr = vgic_has_attr,
>>>>>>  };
>>>>>> +
>>>>>> +
>>>>>> +/*
>>>>>> + * set up a default identity routing table
>>>>>> + * The user-side can further change the routing table using
>>>>>> + * KVM_SET_GSI_ROUTING VM ioctl
>>>>>> + */
>>>>>> +
>>>>>
>>>>> don't add this whitespace between comments and a function declaration,
>>>>> please check the entire patch for this.
>>>> OK
>>>>>
>>>>>> +static int set_default_routing_table(struct kvm *kvm)
>>>>>> +{
>>>>>> +	struct kvm_irq_routing_entry;
>>>>>> +	int i;
>>>>>> +	for (i = 0; i < VGIC_NR_IRQS; i++)	{
>>>>>> +		identity_table[i].gsi = i;
>>>>>> +		identity_table[i].type = KVM_IRQ_ROUTING_IRQCHIP;
>>>>>> +		identity_table[i].u.irqchip.irqchip = 0;
>>>>>> +		identity_table[i].u.irqchip.pin = i;
>>>>>> +	}
>>>>>> +	return kvm_set_irq_routing(kvm, identity_table,
>>>>>> +				   ARRAY_SIZE(identity_table), 0);
>>>>>
>>>>> is identity table used after this stage?  If not, could you not
>>>>> dynamically allocate it and free it after use so we don't waste this
>>>>> staic allocation of memory in the kernel?
>>>>
>>>> yes this definitively can be optimized.
>>>>>
>>>>>> +}
>>>>>> +
>>>>>> +
>>>>>> +/*
>>>>>> + * Functions needed for GSI routing (used by irqchip.c)
>>>>>> + * implemented in irq_comm.c for x86 and ia64
>>>>>> + * in architecture specific files for some other archictures (powerpc)
>>>>>> + */
>>>>>
>>>>> Hmmm, this comment seems rather pointless in this file.  If you want to
>>>>> describe what this function does, then just document this functionality
>>>>> and the parameters/return value, i.e.
>>>>>
>>>>> vgic_set_assigned_irq - set state of IRQs driven by irqfd
>>>>>
>>>>> When an IRQ is raised or lowered.... blah blah blah blah.
>>>>>
>>>>> @e:   the routing entry describing...
>>>>> @kvm: the kvm struct
>>>>> and so on.
>>>>
>>>>>
>>>>> return 0 on success, -error on errors.
>>>>>
>>>>
>>>> OK
>>>>>> +
>>>>>> +static int vgic_set_assigned_irq(struct kvm_kernel_irq_routing_entry *e,
>>>>>> +			   struct kvm *kvm, int irq_source_id, int level,
>>>>>> +			   bool line_status)
>>>>>> +{
>>>>>> +	unsigned int spi = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;
>>>>>
>>>>> I noticed this in the changelogs too, where's the rationale/API
>>>>> documentaiton for the use of irqchip.pin and its semantics on ARM?
>>>>>
>>>>> Should we add something in Documentation/virtual/kvm/api.txt ?
>>>>
>>>> - in 4.24 KVM_CREATE_IRQCHIP I might add that similarly to s390 a dummy
>>>> identity table is created.
>>>>
>>>> - KVM_CAP_IRQFD says "kvm_irqfd.gsi specifies the irqchip pin toggled by
>>>> this event.  When an event is triggered on the eventfd, an interrupt is
>>>> injected into the guest using the specified gsi pin"
>>>>
>>>> Assuming the standard use case is to use an identity/dummy GSI table the
>>>> irqchip.pin still remains the "irqchip pin" toggled on eventfd.
>>>>
>>>> By the way I might remove the mention to the use case where the gsi !=
>>>> irqchip.pin.
>>>>
>>>> Now when reading 4.25 KVM_IRQ_LINE, it is said that it is used to inject
>>>> a GSI as well.
>>>>
>>>> Now on ARM The GSI has the following content.
>>>>
>>>>   bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
>>>>   field: | irq_type  | vcpu_index |     irq_id     |
>>>>
>>>> As such that's true that currently there is an inconsistency.
>>>>
>>>> Currently my GSI == spi number whereas the GSI as injected by
>>>> KVM_IRQ_LINE has the above format.
>>>
>>> But the ARM use of KVM_IRQFD is not clearly documented.
>>>
>>> I think this patch needs to include adding arm to the "Architectures"
>>> list in api.txt and clearly document what the semantics of the fields of
>>> the struct kvm_irqfd are.
>>
>> Yes indeed, this deserves such clarification, all the more so there is
>> the above inconsistency. We come back to the discussion about relevance
>> of routing other things than SPI actually. use case?
>>>
>>>>>
>>>>>> +
>>>>>> +	if (irq_source_id == KVM_USERSPACE_IRQ_SOURCE_ID) {
>>>>>> +		/*
>>>>>> +		 * This path is not tested yet,
>>>>>> +		 * only irqchip with resampler was exercised
>>>>>> +		 */
>>>>>> +		kvm_vgic_inject_irq(kvm, 0, spi, level);
>>>>>
>>>>> hmmm, stuff like this definitely makes this patch an RFC.
>>>>
>>>> Yes I acknowledge I shall step back to RFC. I was a bit eager.
>>>
>>> The alternative is to return an error and put a big fat comment saying
>>> this is NOT IMPLEMENTED yet.
>>
>> actually this KVM_USERSPACE_IRQ_SOURCE_ID path is entered when injecting
>> IRQs without resamplerfd (is edge sensitive ones). As such we can link
>> this issue to your above related comment.
>>
>> Anyway I will move it to RFC since there are quite a lot of open points,
>> especially the next one.
>>>
>>>>>
>>>>>> +	} else if (irq_source_id == KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID) {
>>>>>> +		if (level == 1) {
>>>>>
>>>>> This seems very wrong.  What if an external device raises a
>>>>> level-triggered IRQ, but then lowers it again, without the guest was
>>>>> ever running, now you have to wait until the guest sees the (now
>>>>> inactive) interrupt and EOIs it before the interrupt is lowered on the
>>>>> vgic?
>>>> Hum I am bit confused here. when you enter that code, this means an
>>>> irqfd was triggered. This irqfd was registered by some user code that is
>>>> supposed to be alive and do the proper action to complete the IRQ. in my
>>>> case the xgmac toggles down the IRQ if anyone resets the IRQ status
>>>> register? this action is done in the xgmac guest ISR. A device
>>>> reset/error? might provoke the IRQ pin toggling done. Do you see any
>>>> other events?
>>>
>>> Sure, ok, forget about the 'guest wasn't running', but if the guest ISR
>>> is run with guest interrupts disabled and resets the xgmac device (or
>>> imagine some other device that just lowers the level triggered interrupt
>>> after some timeout), then how does the current code handle this?
>> yes I aknowledge current implementation relies on 2 assumptions:
>> - either the EOI happens or
>> - the pending IRQ is cleared through ICPENDRn access
>>
>> and in case of reset typically we are in trouble to clear the distributor.
>>
>> But I currently do not see any way to detect the device IRQ is toggled
>> down without trapping something.
> 
> I guess that's hard without trapping or probing the VFIO driver, but the
> current scheme just seems to fragile and seems to be built around a very
> specific behavior of your guest and host, and not supporting a generic
> solution.
> 
> Either we will have to trap MMIO to know what's going on, or perhaps
> probe the VFIO state when the interrupt is enabled on the vgic by the
> guest.
> 

agreed.
>>>
>>>>>
>>>>>> +			kvm_debug("Inject irqchip routed vIRQ %d\n",
>>>>>> +					e->irqchip.pin);
>>>>>> +			kvm_vgic_inject_irq(kvm, 0, spi, level);
>>>>>> +			/*
>>>>>> +			 * toggling down vIRQ wire is directly handled in
>>>>>> +			 * process_maintenance for this reason:
>>>>>> +			 * irqfd_resampler_ack is called in
>>>>>> +			 * process_maintenance which holds the dist lock.
>>>>>> +			 * irqfd_resampler_ack calls kvm_set_irq
>>>>>> +			 * which ends_up calling kvm_vgic_inject_irq.
>>>>>> +			 * This later attempts to take the lock -> deadlock!
>>>>>> +			 */
>>>>>
>>>>> Not sure I understand this comment.  What are we trying to achieve, are
>>>>> we using some sort of a workaround to avoid a deadlock?
>>>>
>>>> What I wanted to point out here is I would have prefered to handle both
>>>> levels 0 and 1 in a symetrical manner. irqfd_resampler_ack (in eventfd)
>>>> is calling kvm_set_irq with level 0. This would be the prefered way to
>>>> toggle down the SPI at GIC input instead of doing this in
>>>> process_maintenance in a dirty manner. However this does work because
>>>> irqfd_resampler_ack is called in process_maintenance (the place where
>>>> the EOI is analyzed). process_maintenance holds the dist lock and would
>>>> eventually call kvm_vgic_inject_irq which also attempts to take the lock.
>>>> 	
>>>
>>> I'm afraid that's too much of a hack.  There's an external mechanism to
>>> set an interrupt line to active (level=1) or inactive (level=0) and we
>>> must support both.
>>
>>> The fact that vgic_process_maintenance() can set the interrupt line to
>>> inactive is just something we exploit to properly handle level-triggered
>>> interrupts, but the main API to the VGIC must absolutely be supported.
>>>
>>> Am I completely wrong here?
>>
>> Well I am not sure what you call here the VGIC API?  Is it
>> kvm_vgic_inject_irq? I agree with you on the fact It would be cleaner to
>> use this later in both edge transitions.
> 
> Yes, the "set" function pointer, which eventually calls
> vgic_set_assigned_irq(), is your API.  If that API carries semantics
> that you can raise AND lower the irq through that API, then we need to
> support that I think.
agreed too
> 
> Thanks,
> -Christoffer
> 




More information about the linux-arm-kernel mailing list