[PATCH], issue EOI to APIC prior to calling crash_kexec in die_nmi path

Neil Horman nhorman at tuxdriver.com
Fri Feb 8 11:14:22 EST 2008

On Thu, Feb 07, 2008 at 01:24:04PM +0100, Ingo Molnar wrote:
> * Neil Horman <nhorman at tuxdriver.com> wrote:
> > Ingo noted a few posts down the nmi_exit doesn't actually write to the 
> > APIC EOI register, so yeah, I agree, its bogus (and I apologize, I 
> > should have checked that more carefully).  Nevertheless, this patch 
> > consistently allowed a hangning machine to boot through an Nmi lockup.  
> > So I'm forced to wonder whats going on then that this patch helps 
> > with.  perhaps its a just a very fragile timing issue, I'll need to 
> > look more closely.
> try a dummy iret, something like:
>   asm volatile ("pushf; push $1f; iret; 1: \n");
> to get the CPU out of its 'nested NMI' state. (totally untested)
> the idea is to push down an iret frame to the kernel stack that will 
> just jump to the next instruction and gets it out of the NMI nesting. 
> Note: interrupts will/must still be disabled, despite the iret. (the 
> ordering of the pushes might be wrong, we might need more than that for 
> a valid iret, etc. etc.)
> 	Ingo

Just tried this experiment and it met with success.  Executing a dummy iret
instruction got us to boot the kdump kernel successfully.  

Thoughts on how we should handle this from here?


 * Neil Horman <nhorman at tuxdriver.com>
 * Software Engineer, Red Hat

