completion timeouts with pin-based interrupts in QEMU hw/nvme

Tue Jan 17 08:09:33 PST 2023

On Mon, Jan 16, 2023 at 09:58:13PM -0700, Keith Busch wrote:
> On Mon, Jan 16, 2023 at 10:14:07PM +0100, Klaus Jensen wrote:
> > I noticed that the Linux driver does not use the INTMS/INTMC registers
> > to mask interrupts on the controller while processing CQEs. While not
> > required by the spec, it is *recommended* in setups not using MSI-X to
> > reduce the risk of spurious and/or missed interrupts.
> 
> That's assuming completions are deferred to a bottom half. We don't do
> that by default in Linux nvme, though you can ask the driver to do that
> if you want.
>  
> > With the patch below, running 100 boot iterations, no timeouts were
> > observed on QEMU emulated riscv64 or mips64.
> >
> > No changes are required in the QEMU hw/nvme interrupt logic.
> 
> Yeah, I can see why: it forces the irq line to deassert then assert,
> just like we had forced to happen within the device side patches. Still,
> none of that is supposed to be necessary, but this idea of using these
> registers is probably fine.

There is still no answer why this would be necessary in the first place,
on either side. In my opinion, unless someone can confirm that the problem
is seen with real hardware, we should assume that it happens on the qemu
side and address it there.

Guenter

> 
> >  static irqreturn_t nvme_irq(int irq, void *data)
> > +{
> > +       struct nvme_queue *nvmeq = data;
> > +       struct nvme_dev *dev = nvmeq->dev;
> > +       u32 mask = 1 << nvmeq->cq_vector;
> > +       irqreturn_t ret = IRQ_NONE;
> > +       DEFINE_IO_COMP_BATCH(iob);
> > +
> > +       writel(mask, dev->bar + NVME_REG_INTMS);
> > +
> > +       if (nvme_poll_cq(nvmeq, &iob)) {
> > +               if (!rq_list_empty(iob.req_list))
> > +                       nvme_pci_complete_batch(&iob);
> > +               ret = IRQ_HANDLED;
> > +       }
> > +
> > +       writel(mask, dev->bar + NVME_REG_INTMC);
> > +
> > +       return ret;
> > +}
> 
> If threaded interrupts are used, you'll want to do the masking in
> nvme_irq_check(), then clear it in the threaded handler instead of doing
> both in the same callback.
> 
> > +static irqreturn_t nvme_irq_msix(int irq, void *data)
> >  {
> >         struct nvme_queue *nvmeq = data;
> >         DEFINE_IO_COMP_BATCH(iob);
> > @@ -1602,12 +1623,13 @@ static int queue_request_irq(struct nvme_queue *nvmeq)
> >  {
> >         struct pci_dev *pdev = to_pci_dev(nvmeq->dev->dev);
> >         int nr = nvmeq->dev->ctrl.instance;
> > +       irq_handler_t handler = pdev->msix_enabled ? nvme_irq_msix : nvme_irq;
> > 
> >         if (use_threaded_interrupts) {
> >                 return pci_request_irq(pdev, nvmeq->cq_vector, nvme_irq_check,
> > -                               nvme_irq, nvmeq, "nvme%dq%d", nr, nvmeq->qid);
> > +                               handler, nvmeq, "nvme%dq%d", nr, nvmeq->qid);
> >         } else {
> > -               return pci_request_irq(pdev, nvmeq->cq_vector, nvme_irq,
> > +               return pci_request_irq(pdev, nvmeq->cq_vector, handler,
> >                                 NULL, nvmeq, "nvme%dq%d", nr, nvmeq->qid);
> >         }
> >  }
> > 
> > 
> 
>