[PATCH 3/3] x86/quirks: Add parameter to clear MSIs early on boot
okaya at kernel.org
Thu Oct 18 13:30:22 PDT 2018
On 10/18/2018 4:13 PM, Guilherme G. Piccoli wrote:
>> These kind of issues are usually fixed by fixing the network driver's
>> shutdown routine to ensure that MSI interrupts are cleared there.
> Sinan, I'm not sure shutdown handlers for drivers are called in panic
> kexec (I remember of an old experiment I did, loading a kernel
> with "kexec -p" didn't trigger the handlers).
AFAIK, all shutdown (not remove) routines are called before launching the next
kernel even in crash scenario. It is not safe to start the new kernel while
hardware is doing a DMA to the system memory and triggering interrupts.
Shutdown routine in PCI core used to disable MSI/MSI-x on behalf of all
endpoints but it was later decided that this is the responsibility of the
Author: Prarit Bhargava <prarit at redhat.com>
Date: Thu Jan 26 14:07:47 2017 -0500
PCI/MSI: Stop disabling MSI/MSI-X in pci_device_shutdown()
The pci_bus_type .shutdown method, pci_device_shutdown(), is called from
device_shutdown() in the kernel restart and shutdown paths.
Previously, pci_device_shutdown() called pci_msi_shutdown() and
pci_msix_shutdown(). This disables MSI and MSI-X, which causes the device
to fall back to raising interrupts via INTx. But the driver is still bound
to the device, it doesn't know about this change, and it likely doesn't
have an INTx handler, so these INTx interrupts cause "nobody cared"
warnings like this:
irq 16: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.2-1.el7_UNSUPPORTED.x86_64 #1
Hardware name: Hewlett-Packard HP Z820 Workstation/158B, BIOS J63 v03.90 06/
The MSI disabling code was added by d52877c7b1af ("pci/irq: let
pci_device_shutdown to call pci_msi_shutdown v2") because a driver left MSI
enabled and kdump failed because the kexeced kernel wasn't prepared to
receive the MSI interrupts.
Subsequent commits 1851617cd2da ("PCI/MSI: Disable MSI at enumeration even
if kernel doesn't support MSI") and e80e7edc55ba ("PCI/MSI: Initialize MSI
capability for all architectures") changed the kexeced kernel to disable
all MSIs itself so it no longer depends on the crashed kernel to clean up
Stop disabling MSI/MSI-X in pci_device_shutdown(). This resolves the
"nobody cared" unhandled IRQ issue above. It also allows PCI serial
devices, which may rely on the MSI interrupts, to continue outputting
messages during reboot/shutdown.
[bhelgaas: changelog, drop pci_msi_shutdown() and pci_msix_shutdown() calls
Signed-off-by: Prarit Bhargava <prarit at redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas at google.com>
CC: Alex Williamson <alex.williamson at redhat.com>
CC: David Arcari <darcari at redhat.com>
CC: Myron Stowe <mstowe at redhat.com>
CC: Lukas Wunner <lukas at wunner.de>
CC: Keith Busch <keith.busch at intel.com>
CC: Mika Westerberg <mika.westerberg at linux.intel.com>
> But this case is even worse, because the NICs were in PCI passthrough
> mode, using vfio. So, they were completely unaware of what happened
> in the host kernel.
> Also, this is spec compliant - system reset events should guarantee the
> bits are cleared (although kexec is not exactly a system reset, it's
More information about the kexec