[RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

Yang, Shunyong shunyong.yang at hxt-semitech.com
Thu Mar 8 19:14:18 PST 2018


Hi, Eric, Marc and Christoffer,

On Thu, 2018-03-08 at 19:12 +0100, Auger Eric wrote:
> Hi Marc, Christoffer,
> 
> On 08/03/18 18:28, Marc Zyngier wrote:
> > 
> > On Thu, 08 Mar 2018 16:19:00 +0000,
> > Christoffer Dall wrote:
> > > 
> > > 
> > > On Thu, Mar 08, 2018 at 11:54:27AM +0000, Marc Zyngier wrote:
> > > > 
> > > > On 08/03/18 09:49, Marc Zyngier wrote:
> > > > > 
> > > > > [updated Christoffer's email address]
> > > > > 
> > > > > Hi Shunyong,
> > > > > 
> > > > > On 08/03/18 07:01, Shunyong Yang wrote:
> > > > > > 
> > > > > > When resampling irqfds is enabled, level interrupt should
> > > > > > be
> > > > > > de-asserted when resampling happens. On page 4-47 of GIC v3
> > > > > > specification IHI0069D, it said,
> > > > > > "When the PE acknowledges an SGI, a PPI, or an SPI at the
> > > > > > CPU
> > > > > > interface, the IRI changes the status of the interrupt to
> > > > > > active
> > > > > > and pending if:
> > > > > > • It is an edge-triggered interrupt, and another edge has
> > > > > > been
> > > > > > detected since the interrupt was acknowledged.
> > > > > > • It is a level-sensitive interrupt, and the level has not
> > > > > > been
> > > > > > deasserted since the interrupt was acknowledged."
> > > > > > 
> > > > > > GIC v2 specification IHI0048B.b has similar description on
> > > > > > page
> > > > > > 3-42 for state machine transition.
> > > > > > 
> > > > > > When some VFIO device, like mtty(8250 VFIO mdev emulation
> > > > > > driver
> > > > > > in samples/vfio-mdev) triggers a level interrupt, the
> > > > > > status
> > > > > > transition in LR is pending-->active-->active and pending.
> > > > > > Then it will wait resampling to de-assert the interrupt.
> > > > > > 
> > > > > > Current design of lr_signals_eoi_mi() will return false if
> > > > > > state
> > > > > > in LR is not invalid(Inactive). It causes resampling will
> > > > > > not happen
> > > > > > in mtty case.
> > > > > Let me rephrase this, and tell me if I understood it
> > > > > correctly:
> > > > > 
> > > > > - A level interrupt is injected, activated by the guest (LR
> > > > > state=active)
> > > > > - guest exits, re-enters, (LR state=pending+active)
> > > > > - guest EOIs the interrupt (LR state=pending)
> > > > > - maintenance interrupt
> > > > > - we don't signal the resampling because we're not in an
> > > > > invalid state
> > > > > 
> > > > > Is that correct?
> > > > > 
> > > > > That's an interesting case, because it seems to invalidate
> > > > > some of the 
> > > > > optimization that went in over a year ago.
> > > > > 
> > > > > 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR
> > > > > fields
> > > > > b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary
> > > > > save_maint_int_state
> > > > > af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary
> > > > > process_maintenance operation
> > > > > 
> > > > > We could compare the value of the LR before the guest entry
> > > > > with
> > > > > the value at exit time, but we still could miss it if we have
> > > > > a
> > > > > transition such as P+A -> P -> A and assume a long enough
> > > > > propagation
> > > > > delay for the maintenance interrupt (which is very likely).
> > > > > 
> > > > > In essence, we have lost the benefit of EISR, which was to
> > > > > give us a
> > > > > way to deal with asynchronous signalling.
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > This will cause interrupt fired continuously to guest even
> > > > > > 8250 IIR
> > > > > > has no interrupt. When 8250's interrupt is configured in
> > > > > > shared mode,
> > > > > > it will pass interrupt to other drivers to handle. However,
> > > > > > there
> > > > > > is no other driver involved. Then, a "nobody cared" kernel
> > > > > > complaint
> > > > > > occurs.
> > > > > > 
> > > > > > / # cat /dev/ttyS0
> > > > > > [    4.826836] random: crng init done
> > > > > > [    6.373620] irq 41: nobody cared (try booting with the
> > > > > > "irqpoll"
> > > > > > option)
> > > > > > [    6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted
> > > > > > 4.16.0-rc4 #4
> > > > > > [    6.378927] Hardware name: linux,dummy-virt (DT)
> > > > > > [    6.380876] Call trace:
> > > > > > [    6.381937]  dump_backtrace+0x0/0x180
> > > > > > [    6.383495]  show_stack+0x14/0x1c
> > > > > > [    6.384902]  dump_stack+0x90/0xb4
> > > > > > [    6.386312]  __report_bad_irq+0x38/0xe0
> > > > > > [    6.387944]  note_interrupt+0x1f4/0x2b8
> > > > > > [    6.389568]  handle_irq_event_percpu+0x54/0x7c
> > > > > > [    6.391433]  handle_irq_event+0x44/0x74
> > > > > > [    6.393056]  handle_fasteoi_irq+0x9c/0x154
> > > > > > [    6.394784]  generic_handle_irq+0x24/0x38
> > > > > > [    6.396483]  __handle_domain_irq+0x60/0xb4
> > > > > > [    6.398207]  gic_handle_irq+0x98/0x1b0
> > > > > > [    6.399796]  el1_irq+0xb0/0x128
> > > > > > [    6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
> > > > > > [    6.403149]  __setup_irq+0x41c/0x678
> > > > > > [    6.404669]  request_threaded_irq+0xe0/0x190
> > > > > > [    6.406474]  univ8250_setup_irq+0x208/0x234
> > > > > > [    6.408250]  serial8250_do_startup+0x1b4/0x754
> > > > > > [    6.410123]  serial8250_startup+0x20/0x28
> > > > > > [    6.411826]  uart_startup.part.21+0x78/0x144
> > > > > > [    6.413633]  uart_port_activate+0x50/0x68
> > > > > > [    6.415328]  tty_port_open+0x84/0xd4
> > > > > > [    6.416851]  uart_open+0x34/0x44
> > > > > > [    6.418229]  tty_open+0xec/0x3c8
> > > > > > [    6.419610]  chrdev_open+0xb0/0x198
> > > > > > [    6.421093]  do_dentry_open+0x200/0x310
> > > > > > [    6.422714]  vfs_open+0x54/0x84
> > > > > > [    6.424054]  path_openat+0x2dc/0xf04
> > > > > > [    6.425569]  do_filp_open+0x68/0xd8
> > > > > > [    6.427044]  do_sys_open+0x16c/0x224
> > > > > > [    6.428563]  SyS_openat+0x10/0x18
> > > > > > [    6.429972]  el0_svc_naked+0x30/0x34
> > > > > > [    6.431494] handlers:
> > > > > > [    6.432479] [<000000000e9fb4bb>] serial8250_interrupt
> > > > > > [    6.434597] Disabling IRQ #41
> > > > > > 
> > > > > > This patch changes the lr state condition in
> > > > > > lr_signals_eoi_mi() from
> > > > > > invalid(Inactive) to active and pending to avoid this.
> > > > > > 
> > > > > > I am not sure about the original design of the condition of
> > > > > > invalid(active). So, This RFC is sent out for comments.
> > > > > > 
> > > > > > Cc: Joey Zheng <yu.zheng at hxt-semitech.com>
> > > > > > Signed-off-by: Shunyong Yang <shunyong.yang at hxt-semitech.co
> > > > > > m>
> > > > > > ---
> > > > > >  virt/kvm/arm/vgic/vgic-v2.c | 4 ++--
> > > > > >  virt/kvm/arm/vgic/vgic-v3.c | 4 ++--
> > > > > >  2 files changed, 4 insertions(+), 4 deletions(-)
> > > > > > 
> > > > > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c
> > > > > > b/virt/kvm/arm/vgic/vgic-v2.c
> > > > > > index e9d840a75e7b..740ee9a5f551 100644
> > > > > > --- a/virt/kvm/arm/vgic/vgic-v2.c
> > > > > > +++ b/virt/kvm/arm/vgic/vgic-v2.c
> > > > > > @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct
> > > > > > kvm_vcpu *vcpu)
> > > > > >  
> > > > > >  static bool lr_signals_eoi_mi(u32 lr_val)
> > > > > >  {
> > > > > > -	return !(lr_val & GICH_LR_STATE) && (lr_val &
> > > > > > GICH_LR_EOI) &&
> > > > > > -	       !(lr_val & GICH_LR_HW);
> > > > > > +	return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE)
> > > > > > &&
> > > > > That feels very wrong. You're now signalling the resampling
> > > > > in both
> > > > > invalid and pending+active, and the latter state doesn't mean
> > > > > you've
> > > > > EOIed anything. You're now over-signalling, and signalling
> > > > > the
> > > > > wrong event.

I am using XOR GICH_LR_STATE(0b'11), so only 0b'11(P&A) will be
signaled. Other state will be false.
And I am curious why the EOI bit in LR indicate the end of interrupt
regardless of the state? Please bear with me as I am a newbie in this
part.

> > > > > 
> > > > > > 
> > > > > > +	       (lr_val & GICH_LR_EOI) && !(lr_val &
> > > > > > GICH_LR_HW);
> > > > > >  }
> > > > > >  
> > > > > >  /*
> > > > > > diff --git a/virt/kvm/arm/vgic/vgic-v3.c
> > > > > > b/virt/kvm/arm/vgic/vgic-v3.c
> > > > > > index 6b329414e57a..43111bba7af9 100644
> > > > > > --- a/virt/kvm/arm/vgic/vgic-v3.c
> > > > > > +++ b/virt/kvm/arm/vgic/vgic-v3.c
> > > > > > @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct
> > > > > > kvm_vcpu *vcpu)
> > > > > >  
> > > > > >  static bool lr_signals_eoi_mi(u64 lr_val)
> > > > > >  {
> > > > > > -	return !(lr_val & ICH_LR_STATE) && (lr_val &
> > > > > > ICH_LR_EOI) &&
> > > > > > -	       !(lr_val & ICH_LR_HW);
> > > > > > +	return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE)
> > > > > > &&
> > > > > > +	       (lr_val & ICH_LR_EOI) && !(lr_val &
> > > > > > ICH_LR_HW);
> > > > > >  }
> > > > > >  
> > > > > >  void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
> > > > > > 
> > > > > Assuming I understand the issue correctly, I cannot really
> > > > > see how
> > > > > to solve this without reintroducing EISR, which sucks
> > > > > majorly.
> > > > > 
> > > > > I'll try to cook something shortly and we can all have a good
> > > > > fight about how crap this is.
> > > > Here's what I came up with. I don't really like it, but that's
> > > > the least invasive this I could come up with. Please let me
> > > > know if that helps with your test case. Note that I have only
> > > > boot-tested this on a sample of 1 machine, so I don't expect
> > > > this
> > > > to be perfect.
> > > > 
> > > > Also, any guideline on how to reproduce this would be much
> > > > appreciated.
> > > > I never used this mdev/mtty thing, so please bear with me.
> > > > 
> > > > Thanks,
> > > > 
> > > > 	M.

The mdev/mtty documentation is at Documentation/vfio-mediated-
device.txt. It docmented how to enable mtty device.
And support for "vfio-pci,sysfsdev" should be availabe in your qemu
version (I compiled the latest version).
Following is my commond to run qemu with mdev support,
"qemu-system-aarch64 -m 1024 -cpu host -M virt,gic_version=3 -nographic
\
-kernel /home/yangsy/up-kvm/arch/arm64/boot/Image.gz \
-initrd /home/yangsy/kvm/ramdisk/initrd.img \
-netdev user,id=eth0 -device virtio-net-device,netdev=eth0 -enable-kvm
\
-append "root=/dev/ram rdinit=/sbin/init" \
-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-
3c1e-e6bfe0fa1001
"
For just test this vgic case, type "cat /dev/ttyS0" in guest. But if
test read/write multiple bytes, please apply following patch also
https://patchwork.kernel.org/patch/10267039/

> > > > 
> > > > From 66a7c4cfc1029b0169dd771e196e2876ba3f17b1 Mon Sep 17
> > > > 00:00:00 2001
> > > > From: Marc Zyngier <marc.zyngier at arm.com>
> > > > Date: Thu, 8 Mar 2018 11:14:06 +0000
> > > > Subject: [PATCH] KVM: arm/arm64: Do not rely on LR state to
> > > > guess EOI MI
> > > >  status
> > > > 
> > > > We so far rely on the LR state to decide whether the guest has
> > > > EOI'd a level interrupt or not. While this looks like a good
> > > > idea on the surface, it leads to a couple of annoying corner
> > > > cases:
> > > > 
> > > > Example 1: (P = Pending, A = Active, MI = Maintenance
> > > > Interrupt)
> > > > P -> guest IAR -> A -> exit/entry -> P+A -> guest EOI -> P ->
> > > > MI
> > > Do we really get an EOI maintenance interrupt here?  Reading the
> > > MISR
> > > and EISR descriptions make me thing this is not the case...
> Hum yes in EISR it is said that ICH_LR.State = 0b00!
> > 
> > 
> > Yeah, it looks like I always want EISR to do what I want, and not
> > to
> > do what it does. Man, this thing is such a piece of crap.
> > 
> > OK, scratch that. We need to do it without the help of the HW.

If convenient, maybe we can get something from HW gus. :-)

Hi, Marc,

Do you need me to test the patch you posted for EISR? As it seems there
are some things need more discussion.

> > 
> > > 
> > > > 
> > > > The state is now pending, we've really EOI'd the interrupt, and
> > > > yet lr_signals_eoi_mi() returns false, since the state is not
> > > > 0.
> > > > The result is that we won't signal anything on the
> > > > corresponding
> > > > irqfd, which people complain about. Meh.
> > > So the core of the problem is that when we've entered the guest
> > > with
> > > PENDING+ACTIVE and when we exit (for some reason) we don't signal
> > > the
> > > resamplefd, right?  The solution seems to me that we don't ever
> > > do
> > > PENDING+ACTIVE if you need to resample after each
> > > deactivate.  What
> > > would be the point of appending a pending state that you only
> > > know to be
> > > valid after a resample anyway?
> > The question is then to identify that a given source needs to be
> > signalled back to VFIO. Calling into the eventfd code on the hot
> > path
> > is pretty horrid (I'm not sure if we can really call into this with
> > interrupts disabled, for example).
> > 
> > > 
> > > 
> > > > 
> > > > 
> > > > Example 2:
> > > > P+A -> guest EOI -> P -> delayed MI -> guest IAR -> A -> MI
> > > > fires
> > > We could be more clever and do the following calculation on every
> > > exit:
> > > 
> > > If you enter with P, and exit with either A or 0, then signal.
> > > 
> > > If you enter with P+A, and you exit with either P, A, or 0, then
> > > signal.
> > > 
> > > Wouldn't that also solve it?  (Although I have a feeling you'd
> > > miss some
> > > exits in this case).
> > I'd be more confident if we did forbid P+A for such interrupts
> > altogether, as they really feel like another kind of HW interrupt.
> the LR P+A looks strange to me too. all the more so it may cause the
> same IRQ to be acked twice?
> 
> P -> A -> 0 (resample). Doesn't our issue come from the fact we
> reinject
> the P in LR until the line level is deasserted?
> > 
> > 
> > Eric: Is there any way to get a callback from the eventfd code to
> > flag
> > a given irq as requiring a notification on EOI?
> bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned
> pin) was used in the past. I think it does what you want.
> 
> Thanks
> 
> Eric
> > 
> > 
> > Thanks,
> > 
> > 	M.
> > 

I have added some logs to compare level interrupt between pl011(hwirq =
33) and mtty (hwirq = 36). In mtty case, vgic_queue_irq_unlock() is
called twice. But only called once in pl011.

following is the log,
===Without my patch===
###PL011###

<4>[  180.598266] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1
latch:0 level:1
<4>[  180.604460] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1
level:1
<4>[  180.604540] ==>90a0020000000021(active)
<4>[  180.614878] ==>d0a0020000000021(P&A)
<4>[  180.618415] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1
latch:0 level:0
<4>[  180.625508] ==>90a0020000000021(active)
<4>[  180.629343] ==>10a0020000000021(inactive)

###mtty-vfio###
<4>[  223.123329] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1
latch:0 level:1
<4>[  223.129736] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1
level:1
<4>[  223.136027] ==>50a0020000000024(pending)
<4>[  223.139954] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1
level:1
<4>[  223.146460] ==>90a0020000000024(active)
<4>[  223.150273] ==>d0a0020000000024(P&A)
<4>[  223.153827] ==>90a0020000000024(active)
<4>[  223.157668] ==>d0a0020000000024(P&A)
...........cyclic...

I rembered in some tests the state change is cyclic P->A->P&A. But it
seems I cannot reproduce it. Is output LR state
in kvm_vgic_inject_irq() reliable?

===With my patch===
###PL011###
<4>[  114.798528] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1
latch:0 level:1
<4>[  114.804743] ##vgic_queue_irq_unlock 388 irq->intid:33 enable:1
level:1
<4>[  114.804796] ==>90a0020000000021(active)
<4>[  114.815077] ==>d0a0020000000021(P&A)
<4>[  114.818628] kvm_vgic_inject_irq 453 irq:33 enabled:1 config:1
latch:0 level:0
<4>[  114.825726] ==>90a0020000000021(active)
<4>[  114.829560] ==>10a0020000000021(inactive)

###mtty-vfio###

<4>[  161.579083] kvm_vgic_inject_irq 453 irq:36 enabled:0 config:1
latch:0 level:1
<4>[  161.585419] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1
level:1
<4>[  161.591780] ==>50a0020000000024(pending)
<4>[  161.595708] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1
level:1
<4>[  161.602204] ==>90a0020000000024(active)
<4>[  161.606023] ==>d0a0020000000024(P&A)
<4>[  161.609561] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1
latch:0 level:0
<4>[  161.616693] ==>10a0020000000024(inactive)
<4>[  161.620745] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1
latch:0 level:1
<4>[  161.627800] ##vgic_queue_irq_unlock 388 irq->intid:36 enable:1
level:1
<4>[  161.627849] ==>90a0020000000024(active)
<4>[  161.640076] ==>d0a0020000000024(P&A)
<4>[  161.642689] kvm_vgic_inject_irq 453 irq:36 enabled:1 config:1
latch:0 level:0
<4>[  161.649822] ==>10a0020000000024(inactive)


Following is the test patch,

diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
old mode 100644
new mode 100755
index 6b329414e57a..00fb83b11f43
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -26,6 +26,9 @@
 static bool common_trap;
 static bool gicv4_enable;
 
+int monitor_irq = 36;
+module_param(monitor_irq, int, S_IRUGO | S_IWUSR);
+
 void vgic_v3_set_underflow(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpuif = &vcpu->arch.vgic_cpu.vgic_v3;
@@ -39,6 +42,8 @@ static bool lr_signals_eoi_mi(u64 lr_val)
 	       !(lr_val & ICH_LR_HW);
 }
 
+u64 last_val = 0;
+
 void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
@@ -46,6 +51,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 	u32 model = vcpu->kvm->arch.vgic.vgic_model;
 	int lr;
 	unsigned long flags;
+	char *str[]={"inactive", "pending", "active", "P&A"};
 
 	cpuif->vgic_hcr &= ~ICH_HCR_UIE;
 
@@ -60,6 +66,13 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 			intid = val & GICH_LR_VIRTUALID;
 
 		/* Notify fds when the guest EOI'ed a level-triggered
IRQ */
+		if (intid == monitor_irq) {
+			if (last_val != val) {
+				printk("==>%llx(%s)\n", val, str[(val
>> 62) & 0x03 ]);
+				last_val = val;
+			}
+		}
+
 		if (lr_signals_eoi_mi(val) && vgic_valid_spi(vcpu-
>kvm, intid))
 			kvm_notify_acked_irq(vcpu->kvm, 0,
 					     intid - VGIC
_NR_PRIVATE_IRQS);
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
old mode 100644
new mode 100755
index 07126a3b1908..9c284623ea23
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -31,6 +31,8 @@
 #define DEBUG_SPINLOCK_BUG_ON(p)
 #endif
 
+extern int monitor_irq;
+
 struct vgic_global kvm_vgic_global_state __ro_after_init = {
 	.gicv3_cpuif = STATIC_KEY_FALSE_INIT,
 };
@@ -381,6 +383,12 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct
vgic_irq *irq,
 	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
 	kvm_vcpu_kick(vcpu);
 
+	if (irq->intid == monitor_irq) {
+		printk("##%s %d irq->intid:%d enable:%d level:%d\n",
+			__func__, __LINE__, irq->intid,
+			irq->enabled, irq->line_level);
+		//dump_stack();
+	}
 	return true;
 }
 
@@ -401,6 +409,9 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct
vgic_irq *irq,
  * level-sensitive interrupts.  You can think of the level parameter
as 1
  * being HIGH and 0 being LOW and all devices being active-HIGH.
  */
+
+bool monitor_vm_entry_start = false;
+
 int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int
intid,
 			bool level, void *owner)
 {
@@ -437,6 +448,13 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int
cpuid, unsigned int intid,
 	else
 		irq->pending_latch = true;
 
+	if (irq->intid == monitor_irq) {
+		printk("%s %d irq:%d enabled:%d config:%d latch:%d
level:%d\n",
+			__func__, __LINE__, irq->intid, irq->enabled,
irq->config,
+			irq->pending_latch, irq->line_level);
+		monitor_vm_entry_start = true;
+	}
+
 	vgic_queue_irq_unlock(kvm, irq, flags);
 	vgic_put_irq(kvm, irq);

Thanks.
Shunyong.


More information about the linux-arm-kernel mailing list