[PATCH], issue EOI to APIC prior to calling crash_kexec in die_nmi path

Wed Feb 6 18:50:47 EST 2008

On Thu, Feb 07, 2008 at 12:36:57AM +0100, Ingo Molnar wrote:
> 
> * H. Peter Anvin <hpa at zytor.com> wrote:
> 
> >> I am wondering if interrupts are disabled on crashing cpu or if 
> >> crashing cpu is inside die_nmi(), how would it stop/prevent delivery 
> >> of NMI IPI to other cpus.
> >
> > I don't see how it would.
> 
> cross-CPU IPIs are a bit fragile on some PC platforms. So if the kexec 
> code relies on getting IPIs to all other CPUs, it might not be able to 
> do it reliably. There might be limitations on how many APIC irqs there 
> can be queued at a time, and if those slots are used up and the CPU is 
> not servicing irqs then stuff gets retried. This might even affect NMIs 
> sent via APIC messages - not sure about that.

- Kexec code does not wait infinitely for destination cpu to respond to
  NMI. If destination cpu does not reposond in certain amount of time,
  execution continues. So even if NMI was not delivered to destination
  cpu kexec code should have continued. (Dangerous though, as we don't
  know what other cpu will be doing in the mean time.)

- Even if there is a limitation on how many interrupts can be queued up
  (including NMI), I am not sure how this patch will help that situation.
  This patch is not doing anything on destination cpu (assuming destination
  cpu is also not executing die_nmi() at the same time)

  In fact, even if other cpus are servicing die_nmi() they will end up
  spinning on kexec_lock inside crash_kexec().

I think there is more to this problem then just EOI stuff.

Thanks
Vivek