PATCH/RFC: [kdump] fix APIC shutdown sequence
vgoyal at in.ibm.com
Wed Aug 8 10:42:39 EDT 2007
On Wed, Aug 08, 2007 at 10:06:13AM -0400, Chip Coldwell wrote:
> On Wed, 8 Aug 2007, Vivek Goyal wrote:
> > On Tue, Aug 07, 2007 at 07:41:30PM +0200, Martin Wilck wrote:
> > >
> > > Can you explain how, on the front side bus, the IO-APIC knows whether
> > > a CPU has accepted the INT message? There is no response
> > > to the INT message on the bus, except for the EOI which comes much later.
> > > I'm not saying that you're wrong, I just really don't understand this
> > > point.
> > >
> > I don't know what is exactly hardware protocol. I am just going by
> > intel documentation.
> I think it's important to distinguish between the LAPIC receiving an
> interrupt and the CPU receiving an interrupt. The former could happen
> without the latter if the CPU has set the TPR above the priority of
> the interrupt received by the LAPIC. In that case, the interrupt is
> kept pending in the LAPIC and recorded in the IRR if I understand the
> Intel documentation correctly.
> So I think the scenario which leaves IRR set when the kdump kernel
> starts is possible.
That's true. I agree that once kdump kernel starts we very well can be
in a situation where ISR/IRR bits of LAPIC are set and IRR bit of IOAPIC
is set. These are pending interrupt which will be delivered in the
second kernel and that kernel will reject these pending interrupts as spurious
interrupt and send an EOI. This will clear the LAPIC state.
But the issue here seems to be that LAPIC state got clear but IRR bit
at IOAPIC bit is not cleared because IOAPIC vector information was deleted
in first kernel and now upon receiving EOI, it does not know this EOI belongs
to which vector.
Given the fact that "irqpoll" makes things work, I think we should not
complicate the logic and leave it like that. Anyway, our goal is to capture
the dump and not make sure in second kernel interrupt from all the devices
Otherwise we shall have to resort to techniques like first masking all
the interrupts at IOAPIC level, then issuing EOI on all the cpus in the
first kernel itself. It makes the logic little twisted.
More information about the kexec