[PATCH 1/2] boot: ignore early NMIs

Fernando Luis Vázquez Cao fernando at oss.ntt.co.jp
Mon Mar 12 21:43:37 EDT 2012


On 03/13/2012 03:40 AM, H. Peter Anvin wrote:

> On 03/11/2012 11:14 PM, Fernando Luis Vázquez Cao wrote:
>> The thing is that we want to avoid playing with hardware in the kdump
>> reboot patch when we can avoid it, the premise being that it cannot
>> be accessed without risking a lockup or worse (as the deadlock accessing
>> the I/O APIC showed). The kernel is crashing after all. What is more,
>> I forgot to mention that the long term goal is to leave the LAPIC
>> untouched too (we really want to keep the number of things we do in the
>> context of the crashing kernel to the bare minimum), so we would still
>> need to fix the early IDT.
>>
>> My patch set just installs a special handler for the NMI case so I think
>> it is pretty simple and self contained.
>>
>> Another reason to apply these patches is to be consistent with the rest
>> of the kernel. Spurious NMIs that would have been ignored after installing
>> the final IDT would cause the system to halt if they happen
>> to arrive while the early IDT is in place.
> I'm concerned that you're adding failure modes


This patch set just brings the early IDT in line with what we do after
switching to the final IDT, i.e. we ignore NMIs. The only difference is
that we do not honor panic*nmi and unknown_nmi_panic. As things
stand now the kernel will sometimes mysteriously hang. I really
think that independently of the kdump problem it would be nice
to be consistent in this regard.

> because you don't want to
> solve the real problem which is you need to block this at the source.

Indeed, I want to do both. Try to block NMIs at the source when
possible and install an IDT that ignores NMIs to have our backs
covered (avoid triple faults and lockups). As Eric mentioned
it is not clear that we can always identify and stop all the sources.


> It is way more than the IDT that has to work (at the very least, you
> need the GDT and a working stack) at all times in order for NMIs to be
> receivable.

Of course, and that is what the follow-up patch set does. It just needs
some more testing. The patches I sent make use of the early GDT and
it works as expected.

> That doesn't address what happens if you're getting an NMI
> storm either.

Well, the same applies to the final IDT. As I mentioned before
I think we should also try to stop things at the source when
it is safe (of course, first we need to identify all the sources).

- Fernando



More information about the kexec mailing list