completion timeouts with pin-based interrupts in QEMU hw/nvme
Keith Busch
kbusch at kernel.org
Thu Jan 12 09:51:25 PST 2023
On Thu, Jan 12, 2023 at 06:45:55PM +0100, Klaus Jensen wrote:
> On Jan 12 09:34, Keith Busch wrote:
> > On Thu, Jan 12, 2023 at 02:10:51PM +0100, Klaus Jensen wrote:
> > >
> > > The pin-based interrupt logic in hw/nvme seems sound enough to me, so I
> > > am wondering if there is something going on with the kernel driver (but
> > > I certainly do not rule out that hw/nvme is at fault here, since
> > > pin-based interrupts has also been a source of several issues in the
> > > past).
> >
> > Does it work if you change the pci_irq_assert() back to pci_irq_pulse()?
> > While probably not the "correct" thing to do, it has better results in
> > my testing.
> >
>
> A simple s/pci_irq_assert/pci_irq_pulse broke the device. However,
>
> diff --git i/hw/nvme/ctrl.c w/hw/nvme/ctrl.c
> index 03760ddeae8c..0fc46dcb9ec4 100644
> --- i/hw/nvme/ctrl.c
> +++ w/hw/nvme/ctrl.c
> @@ -477,6 +477,7 @@ static void nvme_irq_check(NvmeCtrl *n)
> return;
> }
> if (~intms & n->irq_status) {
> + pci_irq_deassert(&n->parent_obj);
> pci_irq_assert(&n->parent_obj);
> } else {
> pci_irq_deassert(&n->parent_obj);
>
>
> seems to do the trick (pulse is the other way around, assert, then
> deassert).
>
> Probably not the "correct" thing to do, but I'll take it since it seems
> to fix it. On a simple boot loop I got the timeout about 1 out of 5. I'm
> on ~20 runs now and have not encountered it.
Yep, that looks good. It's sounding like something with the CPU irq
handling is treating the level interrupt as edge triggered.
More information about the Linux-nvme
mailing list