PATCH/RFC: [kdump] fix APIC shutdown sequence
Eric W. Biederman
ebiederm at xmission.com
Wed Aug 8 11:21:23 EDT 2007
Martin Wilck <martin.wilck at fujitsu-siemens.com> writes:
> Hello Eric,
>> How bad is it if you just run with irqpoll in the kdump kernel?
>> If running with irqpoll is usable that is probably preferable
>> to putting in a hardware work around we can survive without.
> Yes, I tried that. No effect.
Ok. Later in the thread it sounds like you have retried this and
irqpoll is working now.
>> Have you done any looking at moving where the kernel initalizes
>> io_apics? One of the todo items on the path is to leave
>> io_apic mode enabled and just startup the kernel in io_apic
> I have tried to recover from the "IRR set" situation in several ways by
> changing setup_IO_APIC_irq(). But I haven't found a way to recover from
> this situation once disable_IO_APIC() had been called.
Yes. The long term goal is to remove the need for calling
disable_IO_APIC(). Because that makes the code simpler etc.
Once we get the kernel to the point where it can start in
ioapic mode (and not in i8259 mode) we can remove the
disabled code from the kexec on panic path.
> I concluded thatthe sequence of events
> "send INT message - never receive EOI - disable IO-APIC pin"
> messes up the IO-APIC (at least this specific one in the
> PCIEx-PCI bridge of the ICH7).
It is quite possible. I have observed a lot of obscure bugs in the
corner cases of the state machines, although it is possible
this is correct behavior and it is just specific to level
triggered interrupts which are almost exclusively not on
the first ioapic in a system like you describe.
I suspect the issue is that we never send the EOI message from
the local apic, and so it waits forever. Or that we have
reprogrammed the vectors by the time we send the EOI message
so that the EOI and the ioapic don't agree on the vector
number when the EOI message is sent. Grumble silly level
triggered interrupts grumble.
More information about the kexec