[Regression] kdump fails to get DHCP address unless booting with pci=nomsi or without nr_cpus=1
Coiby Xu
coxu at redhat.com
Tue Aug 12 04:07:56 PDT 2025
On Tue, Aug 12, 2025 at 11:17:04AM +0100, Marc Zyngier wrote:
>On Tue, 12 Aug 2025 11:09:12 +0100,
>Coiby Xu <coxu at redhat.com> wrote:
>>
>> On Mon, Aug 11, 2025 at 03:52:04PM +0100, Marc Zyngier wrote:
>> > On Mon, 11 Aug 2025 14:03:21 +0100,
>> > Thomas Gleixner <tglx at linutronix.de> wrote:
>> >>
>> >> On Mon, Aug 11 2025 at 15:02, Thomas Gleixner wrote:
>> >>
>> >> CC+ Marc
>> >>
>> >> > On Mon, Aug 11 2025 at 11:23, Coiby Xu wrote:
>> >> >> Recently I met an issue that on certain virtual machines, the kdump
>> >> >> kernel fails to get DHCP IP address most of times starting from
>> >> >> 6.11-rc2. git bisection shows commit b5712bf89b4b ("irqchip/gic-v3-its:
>> >> >> Provide MSI parent for PCI/MSI[-X]") is the 1st bad commit,
>> >> >>
>> >> >> # good: [7d189c77106ed6df09829f7a419e35ada67b2bd0] PCI/MSI: Provide
>> >> >> # MSI_FLAG_PCI_MSI_MASK_PARENT
>> >> >> git bisect good 7d189c77106ed6df09829f7a419e35ada67b2bd0
>> >> >> # good: [48f71d56e2b87839052d2a2ec32fc97a79c3e264] irqchip/gic-v3-its:
>> >> >> # Provide MSI parent infrastructure
>> >> >> git bisect good 48f71d56e2b87839052d2a2ec32fc97a79c3e264
>> >> >> # good: [8c41ccec839c622b2d1be769a95405e4e9a4cb20] irqchip/irq-msi-lib:
>> >> >> # Prepare for PCI MSI/MSIX
>> >> >> git bisect good 8c41ccec839c622b2d1be769a95405e4e9a4cb20
>> >> >> # first bad commit: [b5712bf89b4bbc5bcc9ebde8753ad222f1f68296]
>> >> >> # irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X]
>> >> >
>> >> > There were follow up fixes on this, so isolating this one is not really
>> >> > conclusive.
>> >> >
>> >> > Is the problem still there on v6.16 and v6.17-rc1?
>> >
>> > Yeah, there are way too many things that have been addressed since.
>> > kdump is also a particularly nasty case, as it tends to rely on the
>> > redistributor tables programmed by the previous kernel.
>>
>> Thanks for providing a clue. This may also explain explain why I fails
>> to reproduce this issue against 1st kernel even with the same cmdline of
>> the kdump kernel.
>
>I'm not sure that's a clue. It's only an indication that things are
>not necessarily easy to spot.
>
>Has it ever been reproduced on bare metal? Have you tried v6.16 as
>instructed?
Thanks for replying so quickly!
No, I haven't reproduced it on a bare metal machine and our QE engineers
haven't noticed this issue on any bare metal machine either.
And I can confirm this issue still happens to 6.16.0-200.fc42.aarch64
and 6.17.0-0.rc1.17.fc43.aarch64 on the type of KVM VMS (QEMU PnP device
PNP0c02) where the issue was found.
>
>>
>> >
>> > Also, this says "virtual machines". What's the hypervisor?
>>
>> I'll contact the lab administrator. What kinds of info I should collect
>> to help you narrow down the issue?
>
>Surely you know what hypervisor you're running on, right?
Yes, the hypervisor is KVM. Sorry, I thought merely providing the
hypervisor info isn't sufficient and also misunderstood your request as
providing more details on the host machine.
>
>>
>> > How hard is it to reproduce?
>>
>> It can be reproduced reliably on certain machines. But as of writing I
>> haven't reproduced it on other KVM virtual machines on three different
>> host machines.
>
>Which machines? I'm sorry, but if you want help on this, you'll have
>to provide actual information.
Sorry, I didn't mean to be vague. I thought you question is on how
reproducible this issue is and there is no need to provide the details
on the machines where I can't reproduce this issue. Since you explicitly
request it, I'll be glad to share the details.
I just grabbed three arbitrary bare metal machines having Fedora-42
installed and launched some KVM VMs to see if this issue can be
reproduced easily. Two host machines are as follows (sorry I can't find
the info of the 3rd one)
- GIGABYTE PnP device PNP0c02, ARMv8 (M128-30)
- LTHPCSR112 (01234567890123456789AB), ARMv8 (Q80-30)
The virtual machine image is downloaded from
https://download.fedoraproject.org/pub/fedora/linux/releases/42/Cloud/aarch64/images/Fedora-Cloud-Base-Generic-42-1.1.aarch64.qcow2.
I tried different vCPUs (2, 4), different RAM (4G, 35G) and also two
different UEFI firmware (the default one and one from edk2-experimental
package) but haven't reproduced this issue so far.
>
>Thanks,
>
> M.
>
>--
>Without deviation from the norm, progress is not possible.
>
--
Best regards,
Coiby
More information about the linux-arm-kernel
mailing list