[PATCH 1/2] boot: ignore early NMIs
Eric W. Biederman
ebiederm at xmission.com
Mon Mar 12 16:01:12 EDT 2012
"H. Peter Anvin" <hpa at zytor.com> writes:
> On 03/11/2012 11:14 PM, Fernando Luis Vázquez Cao wrote:
>>
>> The thing is that we want to avoid playing with hardware in the kdump
>> reboot patch when we can avoid it, the premise being that it cannot
>> be accessed without risking a lockup or worse (as the deadlock accessing
>> the I/O APIC showed). The kernel is crashing after all. What is more,
>> I forgot to mention that the long term goal is to leave the LAPIC
>> untouched too (we really want to keep the number of things we do in the
>> context of the crashing kernel to the bare minimum), so we would still
>> need to fix the early IDT.
>>
>> My patch set just installs a special handler for the NMI case so I think
>> it is pretty simple and self contained.
>>
>> Another reason to apply these patches is to be consistent with the rest
>> of the kernel. Spurious NMIs that would have been ignored after installing
>> the final IDT would cause the system to halt if they happen
>> to arrive while the early IDT is in place.
>>
>
> I'm concerned that you're adding failure modes because you don't want to
> solve the real problem which is you need to block this at the source.
> It is way more than the IDT that has to work (at the very least, you
> need the GDT and a working stack) at all times in order for NMIs to be
> receivable. That doesn't address what happens if you're getting an NMI
> storm either.
Good criticism.
The basic problem is what do we do when we receive NMIs during the
kernel boot. Dying mysteriously certainly isn't a good solution.
In the kexec on panic code path we already have a stack and as long
as we can fit the GDT and the LDT on that same page we can have all of
the rest during the entire transition.
After that is basically the kernel's boot code.
The basic problem is which source do we block this at? How many
sources are their? And architecturally last I looked x86 no longer
has a NMI disable EFI and similar systems want to get away without
a CMOS legacy clock because designers so often get them wrong.
Eric
More information about the kexec
mailing list