[PATCH], issue EOI to APIC prior to calling crash_kexec in die_nmi path

Ingo Molnar mingo at elte.hu
Wed Feb 6 19:39:18 EST 2008

* Eric W. Biederman <ebiederm at xmission.com> wrote:

> Looking at the patch the local_irq_enable() is totally bogus.  As soon 
> was we hit machine_crash_shutdown the first thing we do is disable 
> irqs.


> I'm wondering if someone was using the switch cpus on crash patch that 
> was floating around.  That would require the ipis to work.
> I don't know if nmi_exit makes sense.  There are enough layers of 
> abstraction in that piece of code I can't quickly spot the part that 
> is banging the hardware.
> The location of nmi_exit in the patch is clearly wrong.  crash_kexec 
> is a noop if we don't have a crash kernel loaded (and if we are not 
> the first cpu into it), so if we don't execute the crash code 
> something weird may happen.  Further the code is just more 
> maintainable if that kind of code lives in machine_crash_shutdown.

nmi_exit() has no hw effects - it's just our own bookeeping.

the hw knows that we finished the NMI when we do an iret. Perhaps that's 
the bug or side-effect that made the difference: via enabling irqs we 
get an irq entry, and that does an iret and clears the NMI nested state 
- allowing the kexec context to proceed? I suspect kexec() will do an 
iret eventually (at minimum in the booted up kernel's context) - all 
NMIs are blocked up to that point and maybe the APIC doesnt really like 
being frobbed in that state? In any case, the local_irq_enable() is just 
wrong - it's the worst thing a crashing kernel can do. Perhaps doing an 
intentional iret with a prepared stack-let that just restores to 
still-irqs-off state and jumps to the next instruction could 'exit' the 
NMI context without really having to exit it in the kernel code flow?


More information about the kexec mailing list